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PREFACE BY THE PRINCIPAL INVESTIGATOR 



This document. The Analysis of Essays by Computer, 
is primarily intended as the Final Report for the United 
States Office of Education, for a research contract which 
supported us during 1966 and 1967. Yet it also represents 
the first summary statement of all of the work undertaken 
since early 1965 at the University of Connecticut in such 
essay analysis, and in the simulation of human rating 
behavior. 

It is difficult to trace the genealogy of any idea, 
let alone one as interdisciplinary as that underlying the 
present work. The notion of computer analysis of essays 
began to seem conceivable, following an invitational con- 
ference on data banks, led by John B. Carroll at Harvard 
University in December, 1964. My own experience had in- 
cluded work in many of the contributing fields, so that 
the manipulation of language, as described by Philip Stone 
and others there, drew together many threads into an 
eventually engrossing central problem. 

From the moment of conception, this work has owed 
much gratitude to a succession of able and helpful people. 

J. A. Davis was immediately encouraging, as were Allan B. 
Ellis, William Asher, Dexter Dunphy, and Marshall Smith. 

John Duggan and John Valentine, of the College Entrance 
Examination Board, helped greatly in arranging almost 
immediate financial support. All that we did then and later 
owed much to this prompt generosity of the CEEB , and this 
report will also serve as the most unified summation of 
the earliest work done under that support. 

Other generous support, supplementary to that of the 
U.S. Office of Education, has been given by National 
Science Foundation, through its partial funding of the 
University of Connecticut Computer Center. Furthermore, 
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the Massachusetts Institute of Technology was very helpful 
in supporting me as New England Visiting Scientist to their 
Computation Center during 1966-67. Finally, the University 
of Connecticut Research Council has given prompt aid at 
crucial times. 

It would be impossible to list everyone who has been 
helpful with this Project, and there are sure to be impor- 
tant and unintentional omissions. Here at Connecticut, 
many ideas were early discussed with Herbert Garber, then 
with us in the Bureau of Educational Research, with Arthur 
Daigon, with Charles McLaughlin, and with Kenneth G. Wilson. 
These have all served as consultants for brief or longer 
periods of time, and many have contributed ideas or in- 
sights which, because of the nature of this report, are not 
acknowledged explicitly in the text. From the start, the 
Project had, as principal programmers, Gerald and Mary Ann 
Fisher. Mr. Fisher has been a consultant and, for the 
year 1967-68, a Research Associate with us. The programs 
from this employment have plainly been of central importance 

to the work. 

In mid-1966 Dieter H. Paulus joined the Bureau of 
Educational Research, and has in many ways contributed richly 
to the work since that time. His various contributions 
are mentioned often in the text and he is second author of 
this report and partner in the on-going work. 

others who helped here in the Bureau of Educational 
Research were Miss Louise Patros, together with her willing 
staff of Mrs. Helen Ring, Miss Evelyn Haddad, and Mrs. 
Katherine Showalter. To Miss Patros much gratitude is owed 
for office management functions so important to a large 
research, and to all we are grateful for the preparation of 
this manuscript. Some of the research detail was carried 
out by graduate students here in the Bureau. Their names 
are mentioned in the text, together with their contributions. 
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wherever these are included in the report. Among these, 
Donald Marcotte made contributions which were clearly out- 
standing. 

During the work we have consulted many scholars from 
other institutions, formally or informally, and some of 
them should surely be listed here: Walter and Sally Y. 

Sedelow, Robert Stake, Paul Lohnes, Carl Helm, Arthur 
Jensen, Paul Diederich, Ross Quillian and Daniel Bobrow, 
Marvin Minsky, Arthur Anger, Bruce Ressler, John Moyne and 
David Loveman, Leslie McLean, William Cooley, John Carroll, 
Larry Wightman, Stanley Petrick and Jay Keyser. William 
McColly early provided us with the original data and worth- 
while ideas. And Julian C. Stanley has served as a con- 
stant source of encouragement and inspiration. 

Those readers seeking a shorter and more general 
introduction to this project are directed to the various 
publications by the workers, listed in the References. For 
a summary of this writing, they may wish to read the first 
section of Chapter IX of this report. 

Ellis B. Page 
Storrs, Connecticut 
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CHAPTER I 



INTRODUCTION 

When this research was proposed, the time surely 
seemed ripe for a much expanded study of computer analysis 
of student essays. In recent years rapid strides had 
been made in computer hardware technology, in the program- 
ming of language-data processing , and in linguistic 
analysis. More was known than formerly about the simula- 
tion of cognitive products and related fields . Many of 
the building blocks, therefore, appeared to be in place or 
nearly so. What remained was to thrust forward into the 
applied and basic problems of essay analysis and grading. 

This study, therefore, aimed at advancing the know- 
ledge of automatic essay analysis as far as theory, practice, 
and facilities would permit within the rather narrow span of 
time permitted. And this report will explain what was 
designed, attempted, and accomplished during this study 
period in this very new and potentially important field of 
research. It will also set forth current understandings 
about the most profitable avenues for further research. 

And this first chapter will explain the background 
for the problem, both practical and theoretical, as well 
as the specific nature of the research attempted. 

(A) The practical background . The practical problems 
of "objective" grading have long troubled education and the 
field of psychometrics generally. A single judgment of an 
essay by a single human judge is slow, extremely unreliable, 
and of uncertain status. When sufficient training is used, 
and a sufficient mamber of judgments establish a decent 
reliability, essay grading becomes prohibitively expensive. 
Psychometricians have therefore settled for multiple-choice 



items. These have the virtues of wide sampling, since 
more questions may be asked within a given time period; of 
high reliability; and of defensible validity, since scores 
often correlate as highly with judgmental ratings as the 
ratings correlate with each other under ordinary condi- 
tions . 

Nevertheless , educators are far from content with 
multiple-choice examinations as the ultimate criterion of 
achievement. They wish to call upon students for global, 
organized responses concerning large questions in substan- 
tive fields. They would like to ask, in testing self- 
expression, for direct demonstration of corrent and literate 
usage. They are often not satisfied by the statistical 
evidence because of inadequate understanding of this evidence, 
and their incomprehension poses a problem for the psycho- 
metrician. More importantly, two objections to multiple- 
choice testing cannot be refuted comfortably at the present 
time: (1) One virtue of any test is the practice which the 

testing session gives the student. And it seems clear that 
the practice experiences of the student in taking an essay 
test are not precisely the same as in taking a- multiple- 
choice test. (2) Another virtue of any test is the type 
of study which its anticipation motivates in the student 
before the test is administered. Many persons believe 
that students study differently for an essay test than for 
a multiple-choice test, differently for "recall” items 
than for "recognition" items. Clearer evidence on these 
two objections is needed, but their present status supports 
the desirability of finding some fast, reliable, inexpensive, 
and "objective" system of essay grading. 

In English instruction especially, we have an example 
of a troubled field for essay analysis. Many believe 
that students need far more practice in writing essays in 
elementary and high school years . Yet writing without feed- 
back seems generally pointless, and is surely objected to 
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by the students concerned. And the feedback is very diffi- 
cult to systematize. To do the ideal job in essay analysis, 
the high school English teacher would have to spend tremend- 
ous amounts of time out of class. Equalizing the load of 
the English teacher with his colleagues in other subjects 
is an unsolved problem. "Lay readers" are tried on an 
experimental basis in a number of schools, but these are an 
additional expense, are relatively untrained, and pose some 
large problems of coordination and aptness of judgment. 
Furthermore, the supply of qualified and interested English 
teachers has always been too limited. It is hoped that 
some way might be found to employ more broadly the talents 
of the few, so that individual judgment and correction of 
essays might be disseminated in the same way as lectures 
may be filmed or exercises may be printed in textbooks. A 
proper program for correction of essays would therefore be 
an attempt to amplify the effectiveness of the more intelli- 
gent and talented of graders and correcters . This study 
therefore aimed at the type of essay analysis most character- 
istic of English classes. 

The input question . To solve any of these general 
practical problems would of course require practical input 
and output. At present, no computer does an adequate job 
of reading ordinary printing or typing, let along ordinary 
handwriting, into correct card images for further data pro- 
cessing and analysis. Yet rapid strides are being made in 
such recognition, and one may hope for resolution of input 
problems before the judgmental problems are completely sat- 
isfied. The computerized optical reading o f standard type- 
script may be only a very few years away. Or, for that 
matter, the gradual replacement of much of student hand- 
writing in the schools by inexpensive and noiseless char- 
acter printers (perhaps related to the present Stenotype 
machines) seems a plausible and perhaps early development. 

But even with the present necessity of key-punching IBM 
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cards from student copy, practical input for computer 
grading is not wholly out of the question. For example, 
the cost of such key-punching ranges below $2.00 per essay. 
Such an input cost, while out of the question for daily 
classroom routine, would not be unreasonable for an occa- 
sional master analysis, serving as a basic for extensive 
descriptive or prescriptive reporting , for screening or 
placement, or for certain other types of evaluation or 
guidance activity. Indeed, present ob jective-test batteries 
often cost much more than that. For the purposes of this 
study, however, it was assumed that input had been trans- 
formed into punched cards or card images , and concentra- 
tion was on the correction and evaluation problems them- 
selves . 

(B) The theoretical background . The rather momentous 
practical consequences of computerized essay grading will 
be some years away. Before these are felt, there were 
theoretical questions important to the study , and there 
are theoretical answers which may be furnished by the study. 
These were psychological and linguistic in nature. Psycho- 
logically, for example, what roles do the actual various 
orose characteristics play in the cognitive and effective 
rating processes? Actual manipulation of prose character- 
istics is not anticipated in the present design, and 
■j^h 0 ]^ 0 fore direct causal relationships will not be infer 
rable, but some important implications for these processes 
may turn psychological experimentation into some fruitful 

channels . 

AS a linguistic example, there is the additional 
understanding which may be gained of the nature of prose 
description. As Francis (1958) has pointed out, there are 
several kinds of "grammar" : among them the prescriptive 

grammar, or "etiquette," of the schools, and the descrip- 
tive grammar characteristic of modern linguistics. (Also 
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see "What Grammar?" by Gleason, 1964) . It may be noted 
that computer analysis of this proposed kind produces 
still another sort: a set of descriptions resulting 

from the computer's own peculiar limitations and abilities 
A list of prepositions may be employed, for example, and 
any match with this list may cause a counter to be incre- 
mented. In such a program, some words will be counted 
which the competent human judge would classify in other 
ways: as adverb, subordinating conjunction, coordinating 

conjunction, etc. Yet from this NPREP count may result a 
description which would be impractical for human judgment, 
which is 100% reliable within the essay, which probably 
has high reliability across essays of the student, and 
which may be useful in predicting the qualitative human 
judgments of the essays. 

Furthermore, it was intended to use certain extant 
computer analyzers from other researches, and this was 
done. These are efforts to perform linguistic analysis 
within the sentence, and they are inevitably limited in 
accuracy. The limitation in accuracy need not be a handi- 
cap, however, in terms of useful theoretical and practical 
description. 

The important point here is that the computer may pro 
vide new measurements of language usage and these will 
have inevitable importance for theory building and basic 
discovery. These measurements do not presently carry 
heavy theoretical freight, only because they have not been 
observable within the traditional technology . (See later 
discussion on this point.) 

More will be said in the final chapter about theoreti 
cal outlooks for such research. It is enough here to note 
that both practical and theoretical interests motivated 
the present study. 



- 5 - 



Related Research 



The field of essay evaluation by computer represents 
a new focus within the (also new) field of computational 
linguistics, just as it represents a new and divergent 
speciality within educational measurement and educational 
technology. Like all promising new areas of scholarly 
investigation, however, it must draw heavily upon some 
combination of background disciplines not ordinarily con- 
sidered together. This section on related research will 
consider some materials from these background disciplines. 

(a) Background disciplines 

(1) Psychometrics is a basic discipline within which 
any system of evaluation must be justified. The discipline 
already has achieved many technical skills (assessment of 
various forms of reliability and validity) necessary to 
proceeding with the study at hand. Some of the particular 
psychometric problems in content analysis are discussed 
in work by Dexter Dunphy (in Stone, 1966) . Important back- 
ground work dealing with the reliability of essay grading 
by human judges has been done by Diederich, French, and 
Carlton (1961) , by Myers, McConville, and Coffman (1963) , 
and by McColly and Remsted (1963) , to name only three out- 
standing recent examples . In recent years essay testing 
has apparently seemed so unprofitable to psychometricians 
that it has been almost wholly neglected. For example, 
the index of a recent Review of Educational Research about 
testing had only one item referring to essay testing and 
it is negative: "problems of unreliability in grading" 

(Merwin and Gardner, 1962) . 
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(2) Linguistics has potentially very high relevance 

to computer analysis of essay examinations. Important lines 
of study have of course emerged from the "generative grammar" 
thinking of Chomsky (1957) and others (e.g.. Miller, 1962; 
Postal, 1964) . The implications of some of these more 
scientific approaches to linguistics for a broader psychology 
of language have been recognized by Carroll (1964) and 
others . 

Of course, the particular newer field of this discip- 
line known as computational linguistics is more intimately 
related to the present phases of this work. And this field 
in turn has a large overlap with the field of list-processing 
(see below) , and of information retrieval. Many of the 
most effective workers in these fields come not directly 
from linguistics training, but from mathematics, psychology, 
and computer science. 

(3) Curriculum . Curriculum, in all fields using 
essay examinations, is a concern of central relevance to 
the study. This is especially true of language arts educa- 
tion, where there are tensions (Gleason, 1964) betv/een the 
modern descriptive linguist and the traditional "prescrip- 
tive" grammarian (such as Hodges, 1951, or Warriner, 1951), 
and what should be taught in composition is by no means 
certain (Markshef fel , 1964). Eventually, decisions must 
be made about the "right" approaches for any computerized 
master analysis. But for a problem of optimization of 
simulation of human ratings, hypotheses from both camps 
appear useful, and may be empirically checked against the 
criterion. And some interesting light has been cast on 
certain questions of the "etiquette" grammar by work al- 
ready done with this project. 

Although the language arts curriculum is especially 
important, it is by no means unique. Within the present 
research design, the study should produce some interesting 
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information for curriculum within other key disciplines 
(see the procedures) , especially regarding the importance 
Gf special vocabulary. 

(4) Automatic language-data processing has been 

well described by a number of writers (Green, 1963, ch. 13; 
Borko, 1962, pp. 336-423), but one of the best general 
accounts is by Garvin and others (1963) . In general, there 
appear two major methods which are possible: one is the 

content-analytic approach, like that used in the "General 
Inquirer", (Stone, et al , 1966) and is more a "statistical" 
method; the other is more oriented to syntactic and seman- 
tic relationships, as are necessary to the machine-trans- 
lation studies underway, and may be considered a more 
"linguistic" method. Both appear promising for essay grad- 
ing. Of particular potential help appear to be certain 
grammatical-classification computer programs already de- 
vised: a part-of-speech decider which is about 95% accu- 

rate (Stolz, Tannenbaum, and Carstensen, 1965?) , and a 
dependency classifier (Klein and Simmons, 1963) , which lists 
the various different structures possible for a given sen- 
tence. Especially significant are two systems already 
tried with small subsamples of our data, programs by Kuno 
(1964) , and by John Moyne of the IBM Boston Programming 

Center. 

(5) Statistical methodology is like psychometrics 
in having a great body of well— developed doctrine and 
practice which may be brought to bear on the present problem. 
An optimization solution may be sought with some standard 
statistical techniques such as multiple regression (e.g., 
Cooley and Lohnes, 1962); or in some sequential, decision- 
making form, such as an operations flow with a series of 
choice points (cf. Simon, 1964); or in some combination of 
the two. The verbal protocols of human raters might lead 
eventually to some appropriatG combination. 
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(6) Computer technology is very important in both 
hardware and programming. Advances in machine design/ 
especially in larger memories and reduced costs, will make 
feasible the more complex grading programs at more economi- 
cal levels . But present equipment is adequate for exten- 
sive exploration of the problem. 

Great strides have also been taken in designing 
software suitable for language processing. List-processing 
third-level computer languages are especially appropriate, 
and at least three have been written which are extensions 
of the FORTRAN framework; IPL-V, SLIP (Weizenbaum, 1963) , 
and DYSTAL (Sakoda, 1964) . Another important list proces- 
sing language is COMIT (Yngve, 1962a, 1962b) , designed 
for such work as machine translation. A modification of 
COMIT has been made by Stone (1964) and his associates for 
the "General Inquirer" system at Harvard. (After consider- 
able investigation of computer languages, the present program- 
ming was, except for minor subroutines, entirely done in 
FORTRAN IV. This decision makes possible maximum versatility, 
availability of programmers, and dissemination of programs.) 
Two new developments in software promise increased ease of 
programming within AEC. One of these is STUFF (Puckett, 

1966) , which provides for string-manipulating functions 
embedded in FORTRAN IV. The other is in PL/I list-pro- 
cessing (Lawson, 1967), which is promised in an early imple- 
mentation of the IBM 360 series (which has been installed 
at the University of Connecticut in August, 1967) . 

One of the present lines of work in the field is that 
of the General Inquirer (Stone and Hunt, 1963; Stone, et al 
1966; Ellis, 1964; Ogilvie, Dunphy , et al, 1962). For cer- 
tain purposes, a short dictionary of under 4,000 root words 
has accounted for 90-98% of the ordinary written languages 
analyzed by General Inquirer (Dexter Dunphy and Marshall 
Smith, personal conference with the investigators December 
22 in Cambridge, Mass.). Dictionary lookup procedures are 
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crucial to language-processing, and recent developments of 
IBM research promise speeds of dictionary reference up to 
10,000 words per minute (Philip Stone, 1964). As mentioned 
elsewhere in our proposal, studies by Simmons and others at 
System Development Corporation, by Stolz and others at 
Wisconsin, and by Kuno at Harvard have made progress in 
relevant software development. 

Still another major line of automatic language-proces- 
sing appears to be the movement toward what may best be 
called "computational humanism," especially concerned with 
data processing to solve the kinds of problems (concordances, 
attribution, influence, style) usually associated with 
literary scholarship. This movement is rapidly gathering 
momentum with conferences, workshops and institutes, and a 
beginning literature,, such as the recent book by Bowles 
(1967) , or the emerging journal. Computer Studies in the 
Humanities and Verbal Behavior , now being printed by Mouton 
Press, of the Hague. 

These six fields, then, contribute to the background 
expertise which is producing a new and potentially useful 
sub-discipline within educational research. The analysis 
of essays by computer is seen to be based upon a number of 
other disciplines, some going back into the nineteenth 
century, but others part of the general growth of behavioral 
science and computer technology within the last several 
decades . 



Objectives of the Research 

In general, the objectives of the present study did 
not lend themselves to the clear, Fisherian, "classical" 
experimental designs, because not all operations could be 
foreseen. It did, however, permit clear procedures of 
dynamic development and exploration at each stage of the 
study, and clear verification of accomplishment at the end. 

- 10 - 












Properly understood, these characteristics are not handi- 
caps, but symptoms of large research scale. In a recent 
paper. Baker (1965) pointed out that the larger and more 
exploratory research project "must be inherently dynamic 
and possess the ability to change its internal structure 
without sacrificing the rigor of the design" (p. 15) . 

And another writer (Doyle, 1965) has recently stated that 
as a study approaches the "basic research end of the 
spectrum, it becomes more and more imperative to be free 
to alter the plan. Indeed, in basic research altering 
the plan ought to be a state of mind." With the present 
work, it would be mistaken and even misleading to commit 
the investigation prematurely to too narrow a path. 

In general terms, the objectives of the present study 
were as follows : 

(1) To identify important characteristics of student 
prose which are analyzable through specially devised com- 
puter programs. These characteristics were to be aimed 
especially at predicting human judgments of content, organi- 
zation, style, mechanics, and overall quality. 

(2) To develop computer programs for measurement of 
these qualities, or variables related to them, as they 
occur in school essays. 

(3) To analyze the computer-generated objective data 
in relation to subjective measures of the essay dimensions, 
in order to improve the differential accuracy of evaluating 
such essay dimensions. 

(4) To develop through this procedure greater under- 
standing of the human rating process, as applied to objec- 
tively describable prose characteristics. 

(5) To study those aspects of essay description which 
appear most promising for useful feedback tc the teachers 
and students. In other words, to begin exploration of the 
feasibility of computer commentary about student essays. 
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(6) To set forth larger strategies for the most 
promising future exploration of computer grading of essays. 

This report tells about the pursuit of these 
objectives, in the following chapters. 
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CHAPTER II 



THE BASIC DESIGN 

Some fundamental strategies of investigation were 
designs’’ early in 1965, and employed in the first data 
runs of Project Essay Grade (PEG I) , financed primarily 
by the College Entrance Examination Board. But that study 
was intimately involved with the present one, and merged 
into it, and completely separate reporting of research 
done under the two sources of support would do some in- 
justice to this continuity. Furthermore, although there 
has been much reporting of all of this work in professional 
publications, at scientific meetings, and in more popular 
news media, there has not been a disseminable technical 
report of any of it. Thus this report will at least 
touch upon all of the work to date. 

Rationale 

We should begin with a general rationale concerning 
the computer grading of essays. This presentation seems 
necessary for two reasons: (1) The computer analysis of 

essays seems to some a radical proposal, and is not treated 
elsewhere in psychometric literature. (2) The investiga- 
tors intend the present project to open a larger explora- 
tion of such measurement and feedback, with possibilities 
not at all limited to the present work. 

In general, then, there appear to be at least two 
dimensions of the problem of essay grading, with two general 
approaches in each dimension. In the first place, there is 
the content vs. style dimension. Are we interested in wh^ 
the student says (e.g., about the discovery of America by 
Columbus) , or in the way he says it (e.g., his use of pu c 
tuation)? We all know that these categories are not mutual y 
exclusive, but they are useful concepts for our first 
orientation (Page, 1966) . 
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In the second place, there is the dimension of rating 
simulation vs. master analysis. Are we interested in an 
actuarial approximation of the ratings of human judges 
(e.g., in certain words statistically associated with high 
ratings, even though not themselves regarded as an index 
of correct expression)? If so, we are essentially inter- 
ested in rating simulation. Or are we interested in the 
computer doing a "reading" of language and performing a 
kind of informed and rational "judgment"? If so, we are 
speaking of the computer as master analyst, and of creating 
a kind of "artificial intelligence." These two dimensions 
are pictured in Figure II-l. 





I 


II 




Content 


Style 


A. 


I -A 


II-A 


Rating 

Simulation 






B. 


I-B 


II-B 


Master 

Analysis 







Figure II-l 

Possible Dimensions of Essay Grading 



Clearly the columns of Figure II-l are not going to 
remain unrelated to each other, since in some ways content 
and style are inseparable. And the column headings given 
are not completely satisfactory. Spelling, for example, is 
a consideration in Column II, yet "style" does not appear a 
satisfying rubric for the marking of spelling errors. 

Similarly, Rows A and B will not remain unrelated either 
As the investigation of simulation discovers variables which 
empirically, more and more accurately correlated with 

- 14 - 



are. 



human ratings, the analysis will become more profound and 
will grow closed to the "meaning" analysis eventually 
necessary in Row B. The top row, then, suggests the 
"actuarial approximation" to judging the essay, and the 
bottom row represents the "master analysis" of the essay 
itself. These rows represent matters of computer strategy 
and objectives. 

These rows need further explanation, because they are 
very near the heart of the problem, hence are crucial to 
understanding our progress to date in Project Essay Grade. 
What we have taken as our first goal rs the imitation, or 
simulation, of groups of expert judges. How we reach this 
goal of successful imitation is not the central question, 
so long as it is reached, and so long as we can actually 
match or surpass the human judge in accuracy and in useful- 
ness. In attacking the problem in this way we are clearly 
not doing a "mastei analysis" or generating measures of what 
the true characteristics of the essays are, as ordinarily 
discussed by human raters. Rather, we are content to settle 
for the correlates of these true characteristics. 

To express this important distinction, we have been 
forced to coin two words: trin and prox . A trin is the 

in trin sic variable of real interest to us. For example, we 
may be interested in a student's "aptness of word choice," 
or "diction." A prox , on the other hand, is some variable 
which it is hoped will ap prox imate the variable of true 
interest. For example, the student with better diction 
will probably be the student who uses a less common vocabu- 
lary. At present, the computer cannot measure directly the 
semantic aptness of expression in context, or "diction. 

But it can discover the proportion of words not on a common 
word list, and this proportion may be a prox for the i^in 
^of diction. 

Or another illustration: We may be interested in the 

complexity of a student's sentences, in the branching or 
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dGpGndGiicy structurGS which hG has thG maturity to Gmploy. 
Such sGntGncG complexity wo»’ld, therefore, be a tr^. But 
the sentence-parsing progra. s for computers which exist 
now are not completely satisfactory for our purposes. We 
might therefore hypothesize that the proportion of preposi- 
tions, or of subordinating conjunctions, constitute a prox 
for such complexity • And we might therefore employ this 
proportion, too, in our computer analysis. 

One more essential, and the basic strategy of our first 
essay grading project may be understood: We have begun by 

saying that the basic evaluation of overall essay quality 
must be human. But -which human? If only one expert English 
teacher grades an essay, we know that the judgment will not 
be very dependable. We know that other judges will reach 
a somewhat different conclusion, and even the same judge, 
if he were grading it again, would probably shift his eval- 
uation. The typical inter- judge agreement is represented 
by a correlation coefficient of only about .50. On the other 
hand, when a group of independent experts have graded an 
essay, and when these grades are averaged, this average has 
a rapidly improving dependability. When four judges, for 
example, grade an essay independently, their average judgment 
will correlate with the average of four other judges about 
.80. So it is possible to get reliable human judgment of 
essay quality. But it is extremely, prohibitively expensive 
and time-consuming when applied to any large-scale testing. 

However, getting a reliable human judgment is not too 
expensive for a sample of essays. If we can find a way to 
imitate, then, what the expert human judges do with this 
sample, and if we apply this strategy to a computer program 
for a huge number of other essays, we capture high quality 
of judgment at low cost. And the techniques used to analyze 
the judgment and reproduce it are essentially those already 
so well developed in standard prediction problems. 
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The strategy, then, is very general indeed: if the 

computer may be programmed to simulate some sample, the re- 
sulting algorithm may be employed on arbitrarily large 
numbers of essays drawn from the same population as the 
sample. The validity of any evaluation and analysis will 
then depend on basic conditions which are already very 
familiar, from measurement work, to the psychometrician: 
on the number of judges used to establish criterion evalua- 
tions; on their quality; on the "set" of the judges; on the 
number of essays evaluated; on the nature of the essay 
sampling; on the frequency and consistency of the proxes; 
and so on. And powerful, well-understood statistical tools 
may be brought to bear on the simulation. 

One technique for such simulation, where the appro- 
priate weighting of each prox is unknown beforehand, would 
be the familiar multiple regression, in which one cri" rion 
variable (in this case the human judgment, or trin) may be 
optimally predicted by a discovered weighting of a number 
of predictors (in this case, the computer proxes) . And 
indeed, this general tool of multiple regression, implemented 
by appropriate computer programs, has proved very powerful 
for essay grading, both in the initial strategies and in 
the later ones. 

To summarize the general design, then: (1) Essays 

to be evaluated must (at present) be key punched for computer 
input. (2) These essays must be independently evaluated 
by human judges (of any desired characteristics) , on various 
traits (depending on the research hypotheses) . (3) Hypothe- 

ses must be generated by other human experts , concerning the 
programming of appropriate proxes for evaluation. (4) Tuose 
hypotheses, depending on convenience and promise, must be 
programmed into the computer analysis. (5) The machine 
readable essays are passed through the computer, and the 
proxes recorded for each essay. (6) These proxes are then 
optimized for the best possible prediction of the pooled 
human judgments. 
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The flexibility of the general design is clear. It 
allows for any appropriate selection of judges, any selection 
of proxes , of traits to be predicted, of essays, etc. Thus, 
this design has a great capacity for repeated use as our 
knowledge of essay grading broadens and deepens, and as 
its concerns expand to include all parts of the universe 
of Figure II-l. 

In this study, the attention first focused on simulation 
of ratings of overall quality of style. Then the concentra- 
tion shifted to ratings of various essay characteristics 
(content, organization, style, mechanics, and creativity). 

A variety of subproblems were considered, and hypotheses 
tested, and phrase-recognition procedures were implemented. 
And currently, attention is expanding to include subject- 
matter knowledge exhibited, and more intensive linguistic 
strategies. But the basic design is easily, adapted to these 
and other shifts of focus, as research interests become more 
sophisticated, and exhibit greater breadth and depth. In- 
deed, even with the advanced strategies projected in the 
final chapter of this report, it is difficult to imagine a 
time when such actuarial strategies will not constitute an 
important part of some final decision process. 
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CHAPTER III 



THE INITIAL PROXES 

This chapter will describe more of the fundamental 
thinking to date about computer analysis of essays at the 
University of Connecticut. First this report will con- 
sider the 1965 work, which predicted judgments of the over- 
all writing quality of a set of essays, and second the 
later expanded work, predicting a more complete profile of 
judgments on a number of essay characteristics or traits. 

This particular chapter will be concerned with the sampling , 
procedures, proxes, and programs devised for such analysis. 

Sampling . The basic research design has been described 
in Chapter II. Since there was great flexibility permitted 
in selection of essays, and since the investigators were 
eager to explore the parameters:. of this field, a search was 
conducted for essays which would have certain desired 
characteristics. What seemed desirable were essays which 
(1) were already written under carefully described circum- 
stances; (2) had ratings by multiple human experts already 
assigned, independently of one another; (3) were drawn 
from a student population heterogeneous enough to furnish 
a reasonable reliability for rating sums; (4) were long 
enough to furnish stable measurements of at least some 
prose characteristics; (5) were multiple for each student, 
so that some estimate could be made of test-retest relia- 
bility; (6) were general enough so that findings might have 
fairly wide applicability; (7) were accompanied by correla- 
tive information about the students; (8) were representative 
of a random sample of the target student population; (9) were 
large in number. 
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A sample of essays fulfilling most of these require- 
ments was obtained in 1965 through William McColly, then 
of State University of New York, Oswego. For an earlier 
experiment in composition teaching, McColly and Remstad 
(1963) had arranged for English classes at Wisconsin High 
School (Madison) to write four essays, on four different 
topics, about one month apart. These had been indeed 

(1) written under carefully described circumstances; 

(2) given four independent ratings for "overall writing 
quality"; (3) drawn from a heterogeneous student population, 
representing grades eight through twelve, with an average 

IQ of about 114; (4) of an average length of over 300 
words; (5) four in number for each student; (6) written 
on rather common themes, such as whether the "best things 
in life were really free", or whether "anger" could have 
good uses; and (7) accompanied by fairly extensive informa- 
tion about the student writers. Since they were from one 
(rather atypical) high school, they could not be said to 
represent a random sample from the secondary population of 
the United States. On the othe'r hand, for such an explor- 
atory research, the proposed experimental analyses were so 
broad that subtle interactions with ability levels, or with 
other levels of student population, were believed of small 
initial concern. Finally, the number of the essays was 
substantial, with well over 250 essays for each of the four 
v/riting sessions. For multivariate analysis especially, 
large numbers of cases are very important. 

The question of interjudge reliability is of great 
importance, since any optimization technique, such as 
multiple regression, must have a decently reliable criterion 
if it is to produce any nonrandom results. The overall 
ratings assigned by the Wisconsin judges had an average 
interperson agreement of about .5, and an analysis-of- 
variance reliability for four such judgments pooled of 
around .83 (McColly and Remstad, 1963, p.49 )• This high 
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a reliability would give the sums (or averages) a suffi- 
cient stability for use as a criterion. 

Hypotheses and proxes . Having defined the criterion 
and established a suitable sample, the next important task 
was to determine what hypotheses were appropriate, i.e., 
which of the available hypotheses could be shaped into 
suitable algorithms to provide proxes for the multiple 
regression. Clearly, it would have been ideal if we could 
have incorporated into a massive computer program nearly 
the whole of standard texts on usage and rhetoric, such as 
the Harbrace Handbook (Hodges, 1951). That is, in one 
sense, still the target of such work, but no one dreamed 
that anything approaching such a goal could be implemented 
into the study at such an early time. The problems were 
not simply economic and logistic. More importantly, they 
stemmed from fundamental uncertainty about the nature of 
language and of the human reading process. The present 
status of such work will be considered under suggested 
future strategies. Here shall be discussed the sort of 
thinking generated in conferences of consultants (Daigon, 1966) 

The agreement between independent raters of the essays 
will indicate the degree to which the essays themselves 
(rather than the independent personalities, moods, biases, 
etc., of the judges) influenced the ratings. That is, the 
inter-rater agreement is a function of the physical influence 
of the word patterns of the essays. In principle, therefore, 
the computer is limited in its simulation of the group judg- 
ment not by any spiritual nature of the essay itself, but 
only by the extent to which the computer program can be 
designed to reflect the group responses (Page, 1967b). 

These group responses may be presumed to be related 
to certain intrinsic characteristics of prose. These in- 
trinsic characteristics may deal with mechanics, with 
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organization, wiuh diction, etc. They are described in 
detail in prescriptive grammars, and elsewhere, and may be 
further elaborated by the project's investigators and con- 
sultants. On the other hand, some characteristics of 
ultiraate interest, some trins, may be unmeasurable with 
present knowledge and technology, and some possible approx- 
imation to them may be studied, in the hope that these 
second-order variables will be correlated with the trins. 

As one example, spelling may be considered a trin, 
or almost so. The simplest effective strategy for analysis 
of spelling with available computer technology was to use 
a list of misspellings « A list of several thousand common 
spelling errors in their misspelled forms (e.g.. Gates, 

1937, with later supplement) will, consultants agreed, 
possibly account for many misspellings in high school papers. 
Each word in each essay may be looked up in such a computer- 
stored list, therefore, and a student's "misspelling score" 
augmented by one point whenever such a word is encountered 
for the first time. Not all student misspellings will be 
discovered by this method, but scores so generated would 
be correlated with the "true" spelling scores as might be 
discovered by human examiners, and any given misspelling 
is a trin. There are other available trins. Ungrammatical 
combinations of words, examples of generally poor diction, 
and other solecisms may be similarly discovered and tabu- 
lated from comparison with such lists, and may also be 
considered trins, considered individually. 

On the other hand, what of the "less mechanical" 
questions of content, organization, thought pattern? Let 
us consider an example of a prox: The Harbrace College 

Handbook (Hodges, 1951) contains a chapter on "the para- 
graph." Surely the judgment of paragraph organization 
is one of the loftier goals to which the project may aspire, 
and a fully satisfactory simulation may be some good time 
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away. But consider certain rules given by Hodges for the 
paragraph. His Rule 31b is: 

Give coherence to the paragraph by so inter- 
linking the sentences that the thought may 
flow smoothly from one sentence to the next. 

(p. 330) 



This rule is of course too general to afford much 
help. But Hodges has given more prescriptive help in the 
five sub-rules [each provided with examples not reprinted 
here] : 

(1) Arrange the sentences of the paragraph in 
a clear, logical order. 

(2) Link sentences by means of pronouns referring 
to antecedents in the preceding sentences . 

(3) Link sentences by repeating words or ideas 
used in the preceding sentences. 

(4) Link sentences by using such transition 
expressions as the following: 

ADDITION moreover, further, furthermore, 

besides, and, and then, likewise, also, 
nor, too, again, in addition, equally 
important, next, first, secondly, 
thirdly, etc . , finally, last, lastly 

[etc., through other longer lists] 

(5) Link sentences by means of parallel structure 
— that is, by repetition of the sentence 
pattern. (pp- 330-335) 

These rules suggested some good researchable hypotheses . 
.. Number (4), with its extensive list of words believed 
appropriate to link ideas in different ways, was the most 
convenient, and was researchable through a straight 
dictionary- lookup procedure like that used for spelling. 

The question is then to what degree such words may be a 
prox for the trin of paragraph organization. Similarly, 
Number (3) may be researchable, if the repetition of words 
is alone researched. The repetition of ideas would clearly 
depend on a dictionary or thesaurus beyond the scope of the 



immediate project. For Number (2) , a prox might be the 
number or proportion of such pronouns occurring after the 
first sentence in any paragraph. (The complicated questions 
of pronoun reference again depend on distant developments 
in semantic and syntactic analysis.) Hodges other rules 
may perhaps be approximated rather remotely , but argue 
for developing or adapting a syntactic sentence analyzer. 

Another example of a trin was word fluency. This 
yQj-iable was clearly difficult to measure mechanically/ 
since it would often depend upon semantic understandings, 
and these were generally beyond the scope of available 
technology. Nevertheless, possible proxes suggested them.- 
selves. Lists of "common words" exist (Lorge, 1959) . The 
words of essay text may be looked up in such lists and, 
where unlisted, scored appropriately. The ratio of such 
unlisted words to total number of words may be included in 
the multivariate analysis to determine whether it aids in 
predicting evaluative rating. Or another approach, closer 
to a "content" analysis, would be to check for the presence 
of certain words suggested by dictionary or thesaurus as 
synonyms or near-synonyms of some thematic words. And ex- 
tensive work of this kind is currently underway in a new 
phase of the research. 

In short, the hypotheses for the trins underlying the 
human ratings were very num.erous, and preliminary thinking 
of this sort, both initially and through the following two 
years of work, occupied a fair share of the time of consult- 
ing experts. As always with multivariate research, it 
would be far too cumbersome to recount the entire chain of 
thinking leading to each specific prox employed, yet some 
explanation will be included in the next section. The most 
obvious and general hypothesis for all trins was that the 
papers receiving better human marks would tend to be written 
in a style more conformable with the standard textbooks. 
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Hypothe ses and proxes . The first 30 proxes which we 
settled upon 'grew out of several considerations: (1) We 

would first decide which trins were ideally measurable; 
but as we have seen, such a list included almost the en- 
tire handbook of usage, with most points defined very in- 
tuitively. (2) We would then decide what short-cuts 
might be taken to an approximation of such trins; where 
these were easily manageable, they would be programmed into 
the analysis. (3) We would furthermore have, from the 
nature of our text analysis, a number of variables which 
would be fortuitously and easily come by; and these might 
be examined routinely for possible assistance in prediction. 

Ordinarily, as almost all methodologists believe 
(e.g., Tatsuoka and Tiedeman, 1963), research should be 
primarily theory-oriented, i.e., directed by hypothesis 
and associated deduction. Yet m.ultivariate analysis does 
not really lend itself to complete explication and text of 
each separate hypothesis, and in general prediction research 
would be unnecessarily and artificially restrained if it 
were not permitted use of any convenient predictors, re 
gardless of the vagueness of rationale for their inclusion. 
There were in this study a fair number of what might be 
called, therefore, "proxes of opportunity." Some data 
about each of the initial proxes will be reported later. 

Here they will be listed, and briefly explained. 

1. Title present or absent. It was early noticed 
that some students did write a title, and some did not. 

It was guessed, provided there were a fair division on 
this point, that the better students would be somewhat more 
apt to compose titles; and there would be therefore an 
expectable positive correlation with hiiman ratings. 

2. The average sentence length is a variable of 
considerable interest. If a sentence is defined the way 
the student writer defines it (that is, as a string of 
words between non-abbreviating periods) , then there is not 
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much evidence to expect more than a slight correlation 
with quality. Kellogg Hunt, for instance (1966) , has 
shown that mean sentence length remains fairly constant 
with advancing school age. On the other hand, it might be 
supposed that a combination of sentence length and depend- 
ency relations wculd be reasonably important: that sentence 
length without such internal dependencies might be a sign 
of the poor writer, the run-on style; but that sentence 
length with such dependencies might be a sign of greater 
language maturity. 



3. The number of paragraphs will often be very small 
for a really immature writer, just as other forms of 
linguistic markers and conveniences will also be under- 
utilized. Thus it was predicted that frequency of para- 
graphs would be positively correlated with writing quality. 

4. Subject-verb openings are the sentence beginnings 
where the subject phrase is apparently first. Without a 
parsing program, this variable was only approximated, and 
it was done so on the assumption that the first word would 
in the majority of cases be adequate for decision. Any 
pronoun, article, abstract noun, etc., will typically signal 
a subject opening, whereas an adverb, subordinating conjunc 
tion, etc., will typically signal a left-branching sentence. 
An essay's score on this variable, then, would be represented 
as a ratio of subject openings to total number of sentences. 

A common youthful failing is a stodgy, mechanical style 
without variation, while the sign of the more mature writer 
is a variety of sentence structures, depending on the purpose 
of the sentence. Therefore the prediction was that the sub- 
ject-verb proportion would be negatively associated with 

writing quality. 

5, Length of essay in words is surely a characteris 
associated with advancing maturity and skill; and it is 
commonplace correlative of high ratings from human judges. 
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Here the prediction was that essay length would help in 
the prediction of the mark received, and would be positively 
correlated with writing guality • 

6. The frequency of parentheses might be supposed 
characteristic, in a high school sample, of writing fluency. 
Among poor writers, many of these common tools do not seem 
to be a part of the available repertory, and it might there- 
fore be predicted for parentheses, as for other marks of 
punctuation, that they would be positively correlated with 
writing quality. (Here and for similar subsequent counts, 
the frequency should be taken to mean a ratio of the item 
to the appropriate total of the essay. In this case, the 
number of words is used as the control for length. Other- 
wise, length of essay would be a hidden, contaminating 
factor in most of the proxes.) 

7. Apostrophes are in a somewhat different category. 
While it is plainly more correct to write DON'T than DONT, 
it is somewhat better usage, or at least more formal usage, 
to write DO NOT. Frequent apostrophes might be supposed to 
mark a rather informal or casual style, and it might be 
supposed that informality is on the whole negatively regarded 
in a set theme assignment. On balance, therefore, apostrophes 
were predicted to correlate negatively with writing quality. 

8. The frequency of commas might be the most reliable 
measure of the student's repertory of punctuation facili- 
ties, since commas are more common than any other mark. It 
was predicted, then, that comma frequency would be positively 
correlated with quality in a high school setting. 

9. The frequency of periods is not, like frequency of 
commas, a mark of writing fluency, since it may be eviaence 
of short sentences, or of abbreviations. Neither of these 
would be considered an asset in such a formal assignment. 
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10. The frequency of underlined words was predicted 
to be slightly, but positively, correlated with writing 
quality, under the simple assumption which also governed 
parentheses and commas. Sim.ilar predictions were made 
for the following punctuations; 



1 11. 


Dashes 


f 12. 


Colons 


1 13. 


Semicolons 


i\ 

\ 14. 


Quotation marks 


1 15. 


Exclamation marks 


1 16 . 


Question m.arks 


& 

s 

1 and, out 


of order: 


1 

1 26. 

% 


Hyphens 


\ 27. 


Slashes 


1 17. 

< 


Prepositions are 



first place, it was not possible to design an algorithm to 
be very sure about the accuracy of category . For the initial 
programs, a word was a "preposition" if it was found in a 
computer- stored dictionary of prepositions, though to the 
human expert it might be serving as an adverb or subordina- 
ting conjunction, etc. Prepositions are common words, of 
course, yet it was predicted that they would be positively 
associated with writing quality, simply because their fre 
quency would imply dependency substructures within the 
sentence. When sentence length is held constant, as was 
noted for #2 above, one might suppose that preposition fre- 
quency would vary positively with quality. 

18. Connective words, such as nevertheless , however, 
and also, were assumed to characterize language marked by 
com.plexity of relationship, and thus were hypothesized to 
correlate positively with writing quality. 



-28- 















19. Spelling errors are of course the most obvious 
and objective characteristic of writing which is poor 
mechanically. In this test, no attention could be given 
to the errors which are simply misplaced homophones (such 
as THEIR and THERE) , nor to other errors which were guessed 
low in frequency. Rather, the list consisted of some of 
the commonest misspellings which are wrong in any context 
(e.g., THIER, BELEIVE, DONT) . And the assumed direction 
was that there would be a negative correlation between such 
occurrences and the human judgment of writing quality. 

20. Relative pronouns are another set of words used 
by able writers to marshall and interrelate their thoughts. 
Therefore it was predicted that there would be a positive 
correlation between such words and essay quality. 

21. Subordinating conjunctions were similarly ex- 

pected to correlate positively with essay quality, for the 
same reasons as those above: that such words are important 

and relatively advanced tools for imbedding sentences and 
relating one thought to another. 

22. The proportion of common words in an essay was 
determined by mechanically looking up each word in the Dale and 
Hall (1948) list of common words, and dividing the number 

of such occurrences by the total nuiaber of words in the 
essay. Setting aside misspellings (some of which would 
be caught by other dictionaries) , we would expect that 
those essay words not on such a common list would probably 
be less frequent and more discriminating selections , and 
would usually represent better diction. Therefore we pre 
dieted a negative correlation between such common words 

and essay quality. 

23. The occurrence of a sentence with a missing final 
period is very hard to find, with present computer programs. 
However, at the end of a paragraph, a missing period is 
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obviously easy to detect, and this mistake does occur 
among very immature or careless writers. It would be pre 
dieted that where such an error did occur, it would be 
negatively correlated with writing quality. 

24. This item, declarative sentences type A, 
and the next item, treat an attempt to locate sentences 
where question marks are mistakenly omitted. Any sentence 
ending with a period was here taken to be a "declarative' 
sentence. Then the first word is examined to ascertain 
whether the sentence might be interrogative in syntax. If 
the sentence begins with any of the common question intro- 
ducers, such as WHO, HOW, WHERE, etc., it is taken to be a 
"declarative sentence type B," meaning that there is a 
boolean conjunction of a possibly interrogative first word 
with a non- interrogative terminal punctuation. A declara 
tive sentence type A", then, is one in which there is no 
evidence for interrogative sentence either in the first 
word or in the terminal punctuation. From this algorithm, 
then, the sentence is consistently declarative, and may be 
better correlated with the criterion that would be the type 

B sentences. 

25. For these "declarative sentences type B," there- 
fore, one might predict, if anything, a negative correla- 

tion with quality. 

25 , - 27. Punctuation marks, already discussed above. 

28. The average word length in letters might be pre- 
dicted of considerable actuarial importance, because we 
know from Zipfs law that word length is correlated with 
word rarity, and word rarity may be presumed correlated 
with broader vocabulary and more accurate diction. Thus 
the predicted relationship with quality would be positive. 

29. The standard deviation of word length might be 
presumed to be highly correlated with the length itself. 
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but it was thought that the additional information about 
dispersion might add to the total regression. This prox 
would also be predicted to correlate positively with the 
criterion. 

30. The standard deviation of sentence length would 
not be presumed, necessarily, to correlate very closely 
with the length of sentence, since it is a common observa- 
tion that many persons write consistently short sentences, 
or consistently long ones. What would appear ideal is 
mixture of long and short sentences, as appropriate to the 
context, and one would therefore predict a standard devia- 
tion of sentence length which would be positively associ- 
ated with quality. 

In summary, these initial proxes were justified partly 
on rational grounds, partly on common sense observations, 
and partly by expert opinion. As we shall see later on, 
most of the predictions were discovered to be in the right 
direction, though not all; and some were considerably less 
or more effective than we had foreseen. 

The Computer Program . Having decided upon the basic 
proxes for the first studies, it was necessary to choose 
a programming language for their implementation. This is 
not a trivial decision, since the world of "natural-language" 
programming, -as it is called, has been and is a rather 
chaotic one. For some large-scale researches, through the 
past years of programming for natural language analysis , 
efficiency has been extremely important, both for time and 
m.vney considerations. Consequently, some of the most 
important work in language translation (see Oettinger, I960) , 
linguistic analysis (Garvin, 1963; Borko, 1967), content 
analysis (Stone et al, 1966) , and information retrieval 
(Becker and Hays, 1963) has been programmed in symbolic 
languages close to the machine, such as FAP or MAP. And 
these low-level languages not only make changes difficult 
and buggy, but also are extremely difficult to move from 
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one machine configuration to another. Such programs are of 
little help to the new researcher in natural language work. 

At the other extreme are high-level and sometimes quite 
abstract languages which have been used for frontier work 
in psychology, management science, linguistics, and arti- 
ficial intelligence. Such languages are COMIT, IPL-V, 
DYSTAL, LISP, SNOBOL , and SLIP. These and others have been 
designed for list-processing, dynamic-storage applications, 
and often pay heavily in speed and convenience for the 
flexibility and elegance suitable to such applications. 

These were also surveyed rather extensively for any suita 
bility for our system needs. 

Ultimately, the choice of programming languages for 
such a purpose should be governed by these rather over- 
lapping considerations: (1) Is it easy to program, and 

easy to modify? (2) Are the relevant programming skills 
already available in the research team? (3) Will the pro- 
gram in general outlive the rapid and inevitable machine 
changes across the years? (4) Will other researchers be 
able to adapt it easily? (5) Is it natural to our own 
systems tape? (6) Is it a mnemonic language, easy to 

comprehend? 

In light of such considerations and after some false 
starts with COMIT, the investigators decided upon FORTRAN 
IV, for the following reasons: Our own computer installa- 

tion at the University of Connecticut, was at that time 
a rather new IBM 7040, with extensive FORTRAN IV facilities 
as part of the regular system tape. FORTRAN was the most- 
widely used programming language in the computer world, 
with large numbers of available programmers. It further- 
more promised to be available at almost all large computer 
centers for years to come. It is relatively machine-inde- 
pendent, with the exception of a few considerations of 
^ord— capacity and other matters . 



Especially, FORTRAN seemed suitable because, when our 
problem was spelled out carefully, list-processing and 
dynamic storage were not yet necessary to anything we wished 
to accomplish. Such facilities are excellent conveniences 
for certain types of problems; but the better we came to 
understand our early needs, the more obvious it was that 
we needed the following: 

(1) A way of organizing character strings into 
ordinary alphameric arrays, each row of such an array re- 
presenting a recognizable "word", in the usual language 
sense. This organizer would also need to set aside punctu 
ation marks and other non-words. 

(2) A way of reading special dictionaries into 
immediate-access storage, for easy comparison with the 
words of the student text. 

(3) A way of efficiently counting occurrences of 
such dictionary words, for any student sentence and any 
essay. 

(4) A way of checking on various other, non-dictionary 
events in the student text. 

(5) A way of summarizing the proxes for an essay. 

These general goals are shown in only slightly more 
rigorous a way in Figure III-l, which is a flow chart of 
the first program outlines. Here it is seen that our 
dictionaries were input in punched cards , and were stored 
in core, in what are called double-precision arrays. For 
many readers, this requires some explanation. The core 
storage of the IBM 7040 was at that time limited to 32,000 
computer registers, in which each register was limited to 
six characters of the alphabet, number system, punction 
set, etc. While the average English word (in running text) 
is between four and five letters in length, the average 
dictionary word (with small proportions of common words) 
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FIGURE III-l 



GENERAL FLOW CHART FOR FIRST PROGRAM 
PROJECT ESSAY GRADE 
(ESSAY ANALYSIS) 




will naturally be longer, and words will often be too long 
to fit within a six-character register. 

For this reason use was made of a facility of FORTRAN 
programming called "double-precision" addressing, which 
permits a set of two such six-character words to be 
addressed as if it were or^. This scheme permitted English 
words to be packed in up to 12 characters, but truncated 
any words longer than 12. 

Since each word was originally read in from a punched 
card, 80 characters in length, the first problem of pro- 
cessing a sentence was to reorganize these characters into 
words. Such markers as spaces and punctuation permitted 
identification of such words, and these were then "packed" 
from the loose original array, which was organized with one 
character in each computer register, into the denser 12- 
character registers. Then these text words could be 
compared with the dictionary words by comparing the first 
six-character register of each word. If a match were made, 
the second six-character register was also examined, and 
if another match were found, a hit was recorded for the 
particular list examined. This method of "packing" such 
words, then, permitted two economies: a large economy of 

space, since 1000 English words could be contained in only 
2000 computer registers; and a large economy of time, sxnce 
a match of the first six letters could be made in just one 
arithmetic comparison of one cell with another. 

As is shown in Figure Ill-i, the student essays 
were also input in punched cards, and the eventual proxes 
were output in punched cards as well. (Later systems are 

tape-based . ) 

This original FORTRAN IV program, as modified and used 
throughout the length of this present report, is listed 
with considerable comment in Appendix A. Since the 
accompanying documentation is fairly extensive, we 



shall not describe the program in any great detail here, 
although it is obviously one substantial product of the 
work. In general, however, the effort was to make a pro- 
gram that would be: (1) efficient, so that expenditure 

of time would not be too great; (2) modular, so that it 
might be easily understood, and altered as circumstances 
would require; (3) general, so that dictionaries, numbers, 
functions could be easily changed; and mnemonic, so that 
variable names would be reasonably easy to learn and 
remember. 

An example of the modular and mnemonic nature of the 
program might be seen in the function which searches for 
a given text— word in any particular dictionary . This func- 
tion is called INTABL, and appears in statements of the 
form: 

IF (INTABL (WORD, PREP, 100)) GO TO 900 

Here the argument WORD refers to the particular essay 
word to which the DO loop has brought us in our data pro- 
cessing. Let us say that such a word might be AFTER. The 
argument PREP refers to the sub-dictionary containing pre- 
positions, which is stored in core, and may be quickly 
searched. And the argument 100 is the (maximum) length of 
that list of prepositions. The function INTABL causes the 
program to transfer to a subroutine, which makes a search 
in that list called PREP for the word (in this hypothetical 
case, for the word AFTER). If the word is found in the 
list of prepositions, then the function INTABL is "TRUE," 
and the command of the IF statement .is followed. In the 
present case, this means a transfer to statement number 900. 
If the word AFTER had not been found in this subdictionary, 
then the operation would have moved to the next statement 
following the IF, whatever that might be. 

The manner of the search may also be of some interest, 
since dictionary look-up is surely one of the principal 
operations in the program. In a completely random sequence 
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an exhaustive search would have to be made through the list 
in question; this would be much too inefficient. Rather, 
some advantage may be taken of the alphabetical sequence, 
and of the fact that the order of the letters corresponds 
with the size of the binary numbers in which the letters are 
represented. This means that early letters (such as A) 
will be represented by low binary numbers (with many zeroes) . 
This also means that a word may be easily compared with a 
given spot in the list, and it may be said whether that 
word matches it, or may be earlier in the list, or later. 
This is sometimes referred to as "equal to-less than-or 
greater than" comparison. 

Such a comparison permits several techniques. The 
most obvious is to plod through the list until the point 
is reached where the word should be, alphabetically speak- 
ing. If it is not there, then the operation may be re- 
turned to the main program, with the value FALSE. This 
technique of using the alphabet in a straight linear search 
will, then, obviously save about half the search time for 
the word in question. 

A more advanced search technique, however, is what 
is called a binary search. This operates by going at once 
to the middle of the list, and making the comparison at that 
point. If the word is earlier, then the first half of the 
list is divided, and a comparison is made with the list at 
that quarterpoint. The list keeps being narrowed by half 
each time a comparison is made, so that very soon the 
comparison is narrowed to a single word: if the text word 

does not match the list at this point, the operation re- 
turns to the main program with the value FALSE. Such a 
binary search obviously capitalizes on the great economy 
of the exponential number. And this is an economy which 
rises rapidly as the dictionary increases in size. The 
number of comparisons made will be about the logarithm base 
2 of the number of words in the dictionary. That is, if D 
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is dictionary size, and D = 2^, then n is the number of 
comparisons required, in the usual case, to ascertain 
whether any word is present in the dictionary. Then if a 
dictionary is 16 words long, about four comparisons will 
locate it. This may not seem a large saving over the 
linear alphabetical search, when the time is added to 
compute the next comparison. But if a dictionary is 2,048 
words long, a mere 11 comparisons will locate a word's 
proper space, and this binary search yields a great saving 
indeed. 

Other lookup techniques , some even more economical in 
time, are discussed elsewhere (Hays, 1967, Chapter 5). 
Without such efficiencies as binary search, practical 
essay-grading would be prohibitively expensive. A 
number of other efficiencies were introduced into this 
program as well. 

Preparation of the text . As we have said, eventual 
implementation will require some fairly direct input pro- 
cess from the student to the computer, at least for ordinary 
classroom use. For research purposes, we had these key- 
punched by clerks at the University of Connecticut, accord- 
ing to a fairly obvious format. Since at that time our 
key-punch machines had no upper-lower case differentiation, 
all typing was in capital letters. Also, the punctuation 
set was not complete, so that we employed the following 
conventions : 



Name 


Typewritten 


Machine Convention 


Period 


• 


• 


Comma 


/ 


/ 


Semicolon 


• 

/ 


• / 


Colon 


• 

• 


• • 


Exclamation 


f 

• 


.X 


Question Mark 


9 

• 


.Q 


Italics 




(/)xxx 


Dash 


— 




Apostrophe 


f 




Quote 


II 


* 
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Of course/ these made no important difficulty in the 
programming/ since two consecutive symbols are very easy 
to look for. In order to distinguish a period (abbrevia- 
tion) from a period (end of sentence) / we looked for two 
spaces after it/ and took that to mean end of sentence. 
Similarly/ a new paragraph was signalled by four blank 
spaces at the beginning of a new line. 

Key-punching of these essays proceeded at about the 
speed expected of ordinary typing/ although verifying might 
take somewhat longer than ordinary proof-reading. What 
was more time-consuming was that the clerks were under in- 
struction to type the copy literatim / that is / including 
every last mistake of the student in spelling/ punctuation/ 
and word order. This took time/ of course/ because it 
would be contrary to the habits of a- career devoted to 
eliminating such mistakes. 

The most important aspect of the text preparation was 
that nothing was done to the text which was not required 
for it to be machine readable. In no case was any human 
coding of it done for any purpose of the subsequent research 
(for example to identify verbs / nouns / etc.). This means 
that the copy to be read by the computer was in almost 
every obtainable way just what the student himself would 
presumably have written, if he had known how to typewrite 
and had typed it himself on the key-punch. 

Summary . This chapter has elaborated the sampling/ 
hypotheses/ proxes/ programs/ and procedures for the in- 
vestigation of machine analysis of essays. And the princi- 
pal program so fundamental to the work is found in Appendix 
A of this report. The next chapter will treat some results 
of importance from such analysis. 



er|c 
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CHAPTER IV 



PREDICTING OVERALL QUALITY 

This chapter will describe some of the findings, and 
implications of the findings, from the attempt to predict 
the rating of overall quality of writing. This describes work 
done in 1965, 1966, and 1967, largely concerned with the 
data from the Wisconsin study, which has formed a focus 
for much of the research on style up to the present time. 

Human ratings . As has been made clear from Chapter 
II, the principal strategy of the work has aimed at the 
simulation of human judgments, and these human judgments 
are therefore very important. The instructions used for 
the ratings in Wisconsin were described by McColly and 
Rerastad (1963) . They asked for ratings on "overall 
quality", and they had four independent judges for each 
essay, and four essays for each student subject. The 
individual judges were qualified, but their personal 
characteristics are not of much importance for our study, 
and the so-called "individual" ratings represent a kind 
of statistical artifact. That is, when essays are re- 
garded as rows, and the judgments are represented in four 
columns, each of these columns is a kind of composite, 
since it may contain ratings from many of the judges used 
in the Wisconsin study. Each particular element in the 
column is a rating by one human judge, but the column as 
a whole may be the contribution of many such judges. 

With this understanding, it is still worthwhile to 
observe the agreement among these statistical judges. 

For our purposes, we chose two essays to focus upon, writ- 
ten about one month apart from each other . One was writ- 
ten on the question of whether the "best things in life 
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WGTG TGally f TGG / " and thG othGr on thG "usgs of angor. 

ThGSG will bG callGd Essay C ("FrGG") and Essay D ("AngGr") . 
For Essay C, the intorjudgG agrGGitiGnt is shown in thG 
uppGr-lGft quadrant of TablG IV-1. 

HGrG thG kind of agroGiriGnt among judgGS is shown which 
is usually found for indopondont, subjGCtivG Gvaluations 
whGrG thorG has bGGn a CGrtain amount of coaching , hero 
ranging around .50 correlation of each individual judge 
with his peer. 

It is to be expected that increasing the number of 
judges will increase the correlations, since it eliminates 
some random error from the judgments. To demonstrate this 
improvement, we have combined the columns of judgments 
in various ways, to find the effect of increasing the 
judges to two. When Column 1 (standing for the first 
columns of ratings for Essay C) is pooled with Column 2, 
this sum, shown in Column 5, may be correlated that of C3 
+ C4 , shown in Column 6. The discovered correlation is 
.66, clearly higher than that between any two columns 
considered singly. 

Additional comparisons may be made in a similar fashion. 



ing of such intercorrelations, both between human judges, 
between human judge pairs, and between the single and com- 
bined columns, is shown in other cells of Table IV-1. Other 
parts of Table IV-1 will be discussed later in the chapter. 

From psychometric theory, as well as from such empiri- 
cal evidence, we would expect that the reliability of all 
four columns summed together would be higher still. When 
such a summation is done, however, it may no longer be 
correlated with others in the same fashion, since all of 
the data have been used. It is nevertheless possible to 
estimate such reliability through an analysis of variance 




comparisons can be made, and the results (in natural order) 
are; .66, .67, .70, .70, .67, .66. A more complete list- 



such 



comparisons 
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HUKAN JUDGE AND JUDGE PAIR 
CORRELATIOJS FOR ESSAY C QUAUTY 
WITH PREDICTIONS PROl ESSAY D 
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of the columns, and such analysis was reported by McColly 
and Remstad, producing a reliability coefficient of about 
.83 for each of the two essays we are concerned with. 

Such a reliability is not very impressive for such an ex- 
pensive rating process, but it is typical of such evalua- 
tion^ and it does furnish an adequate target for the multi- 
ple regression of the proxes. 

Having two different essays from each student writer, 
we may collect a certain amount of infomation about both 
individual and group stability across trials. Table IV-2 
shows the means and standard deviations for the two stu- 
dent essays, first for Essay C ("Free") , then for Essay D 
("Anger"). As explained previously, these proxes as shown 
here are not the raw frequencies for the essays, since such 
frequencies would have usually a large contaminating factor 
of essay length. Rather, they are the scores as converted 
to ratios and then multiplied to make a positive integer 
in each case. The transformation formulae are given in 
the FORTRAN program, printed in Appendix A. 

The proxes employed have been previously described in 
Chapter III, and the reasoning employed for each, together 
with a prediction of the anticipated direction of correla- 
tion. These proxes were measured in the D essays, using 
an earlier version of the program listed here in Appendix 
A, and these proxes were then used in a multiple-regression 
analysis to predict the human judgments for Essay D. Among 
the aspects examined were the correlation of each prox 
with the criterion, the beta weight contributed by each 
prox, and the test- retest reliability of each prox. This 
information is summarized in Table IV-3. 

In this table. Column A lists the proxes by title, 
and in the same order as described in the last chapter. 
Column B shows the correlation of each prox with the crite 
rion, which was the sum of four human ratings for each 

And Column D indicates the test-retest reliability 



essay. 



TABLE IV-2 



MEANS AND STANDARD DE\TIATIONS OF 
THE PROX SCORES 



Proxea 



1. Title present 

2. At. sentence length 
3* Number of paragraphs 

4. Subject-verb openings 

5. Length of essay in words 

6. Number of parentheses 

7. Number of apostrophes 

8 . Number of coomas 

9. Number of periods 

10. Number of underlined words 

11. Number of dashes 

12. No. colons 

13. No. semicolons 

14. No. quotation marks 

15. No. exclamation marks 

16. No. question marks 

17. No. prepositions 

18. No. connective words 

19. * No. spelling errors 

20. No. relative pronouns 

21. No. subordinating conjs. 

22. No. common words on Dale 

23. No. sents. end punc. pres. 

24. No. declar. sents. type A 

25. No. -declar. sents. type B 

26. No. hyphens 

27. No. slashes 

28. Aver, word length in Itrs. 

29. Stan* dev. of word length 

30. Staa. dev. of sent, length 



Essay C 


Essay D 


Mean 


St. Dev. 


Mean 




.90 


.29 


to 

• 


.38 


176.79 


41.91 


175.48 


41.85 


5.47 


2.14 


5.16 


1.90 


49.27 


14.01 


45.01 


12.36 


397.40 


112.62 


361.32 


104.44 


2.02 


4.55 


1.67 


4.09 


9.62 


9.20 


8.75 


8.28 


48.81 


22.26 


40.97 


21.64 


56.70 


12.36 


58.25 


12.20 


1.74 


3.65 


1.42 


3.26 


1.97 


4.50 


1.37 


3.39 


.57 


1.59 


.47 


1.72 


1.54 


;2#S7 


1.22 


2.73 


293.19 


275.16 


114.45 


156.48 


8.61 


23.36 


U.06 


34.00 


53.35 


70.^ 


25.46 


49.37 


9.73 


1.76 


8.90 


1.80 


.35 


.51 


.43 


.58 


.11 


.31 


.12 


.34 


2.03 


1.05 


1.93 


.96 


2.22 


1.00 


2.89 


1.14 


a.89 


4.58 


79.17 


5.09 


99.07 


3.23 


99.52 


1.82 


92.45 


8.48 


95.48 


6.35 


.56 


1.63 


•61 


2.11 


2.53 


4.69 


1.95 


3.94 


,05 


.39 


.10 


.59 


423.77 


23.41 


438.36 


24.47 


217.72 


20.31 


232.32 


23.13 


82.56 


29.63 


78.07 


31.83 



NOTE: These means and standard deviations are based upon the trans- 
fomed scores, altered so that every individual score would be a 
positive integer, and would usually express a relative rather than 

an absolute frequency. 
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TABLE IV-3 



FROXES USED TO PRSDICT A CRITEKION 
OF OVERALL (^U^ITY (ESSAY D) 



A. 



B. C. D. 



Proxes 



Corr« with Beta Test-Ret* Rel* 
Criterion Wts. (Two essays) 



1. Title present 

2. Av. sentence length 

3. Number of paragraphs 

4. Subject-verb openings 

5. Length of essay in words . 

6. Number of parentheses 

7. Number of apostrophes 

8. Number of commas 

9. Number of periods 

10. Number of underlined words 

11. Number of dashes 

12. No. colons 

13. No. semicolons 

14. No. quotation marks 

15. No. exclamation marks 

16. No. question marks 

17. No. prepositions 

18. No. connective words 

19. No. spelling errors 

20. No. relative pronouns 

21. No. subordinating conjs. 

22. No. common words on Dale 

23. No. sents. end punc. pres. 

24. No* declar. sents. type A 

25. No* declar. sents. type B 

26. No. hyphens 

27. No. slashes 

28. Aver, word length in Itrs. 

29. Stan. dev. of word length 

30. Stan. dev. of sent, length 



.04 


.09 


.05 


.04 


-.13 


.63 


.06 


-.11 


.42 


.16 


-.01 


.20 


.32 


.32 


.55 


.04 


-.01 


.21 


.23 


-.06 


.42 


.34 


.09 


.61 


.05 


-.05 


.57 


.01 


.00 


.22 


•22 


.10 




.02 


-.03 


.29 


.08 


.06 


.32 


.11 


.04 


.27 


.05 


.09 


.20 


.14 


.01 


.29 


.25 


.10 


.27 


.18 


-.02 


.24 


.21 


-.13 


.23 


.11 


.11 


.17 


.12 


.06 


.18 


.43 


-.07 


.65 


.01 


-.08 


.14 


.12 


.14 


.34 


.02 


.02 


.09 


.18 


.07 


.20 


.07 


-.02 


-.02 


.51 


.12 


.62 


.53 


.30 


.a 


.07 


.03 


.43 



i^umber of students judged was 272. Multiple R aga^st human criterion 
(four judges) was .71 for both Essay C and Essay D (D data shown here). 
F-ratios for Multiple R were highly significant. 
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for the proxes, that is, the correlation between Essay C 
and Essay D for the proxes, as a measure of writing habit, 
or stability of writing behavior, in the student writers. 

Overall prediction of the proxes . In multivariate 
analysis, it is often pointless to elaborate a hypothesis 
for each predictor, and to explain how each variable met 
expectations, or failed to do so. But it may be instruc- 
tive to note how well the predictions fared as a whole. 

While some of the predictions were very tentative and loose, 
and while many of the variables obviously had only a non- 
significant relation with the criterion, some estimate may 
be made of the overall success of the predictions. 

In general, the predictions were quite accurate, 
notwithstanding the obvious large random errors in the 
relationships which are evident in the table. The degree 
of success was examined and the results are shown in Table 
IV-4, which displays a contingency diagram for the direc- 
tion of prox correlation with the criterion (positive or 
negative direction) , and shows the relation between the 
predicted and discovered directions. Here the number of 
agreements is seen as 21, and disagreements .7. As is also 
shown in Table IV-4, the chi square was computed to be 
3.12, which, with one degree of freedom and the assumption 
that a one-tailed test is appropriate for such agreement, 
is significant at the five per cent level of confidence. 

One may conclude, therefore, that most the predictions 
were significantly in the correct direction. 

Correlation with the criterion . It does not make 
much sense to describe a summary table in any detail, but 
it is useful to comment on a few outstanding points. As 
was explained in the last chapter, many of the predictors 
used were "proxes of opportunity", and it is not surprising 
that they were relatively unproductive. This is generally 
true for the large number of punctuation marks . The more 
major contributors to empirical prediction were usually 
foreseen. 



TABLE IV-4 



THE DIRECTION OF CORRELATIONS OF PROXES 
V/ITK THE CRITERION: 

PREDICTED AND OBSERVED FREiiUENCIES 



Obaerved 



Predicted 







mm 


+ 


17 


1 


- 


6 


4 



N = 28, since two variables were not predicted. 



N( AD - BC - -S-)^ 

= I . = 3*12 (significant). 

(A+B)(OfD)(A+C)(Bt-D) 
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The average sentence length was anticipated, from 
other literature, to be more important in a multivariate 
than bivariate way, and this is apparently the case. But 
these higher^order relationships, which are only hinted by 
the beta weights and intercorrelations, are very difficult 
to articulate. The length of essay in words, number of 
commas, number of prepositions, number of connectives, 

]^ 0 X 3 ,tive pronouns, spelling errors, common words, and long 
words were all in the anticipated direction. 

On the other hand, there were some surprises in the 
data. The number of question marks was predicted to be 
indicative of variety in style, yet was a negative pre- 
dictor, On the second set of essays analyzed, however, 
it has moved from -.14 to .08, which implies that there 
may be an interaction of this feature with the wording of 
the assigned topics, or of the accompanying instructions 
to the student writers. Such an interaction becomes 
plausible in light of the interrogative wording of the 
"Free" question. 

Another surprise was the negative correlation with 
the criterion fo variable 21, the proportion of subordin- 
ating conjunctions. The assumption that the proportion 
would reflect complexity, and that complexity would be re- 
lated to maturity of style, was not destroyed, but it surely was 
shadowed by the negative correlation of -.12 for the D 
essays. Here an interaction with topic is not a plausible 
explanation, since for Essay C the discovered correlation 
had moved only slightly, to -.06. It i£ worth note that 
for both essays, when the other predictors are taken into 
consideration, the beta weights for subordinating conjunc 
tions are both positive. But here again, explanation of 
such higher-order effects are difficult to ascertain. It 
is probable that the explanations of this surprise should 
be pursued in the specific words in the list of subordina- 
ting conjunctions, and in further syntactic analysis of the 
sentences whercj they are used. This sort of exploration is 
further discussed in the final chapter of this report. 
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Surely the question of fluency is a most important one 
in the evaluation of essays. Two strong arguments for some 
general trait of prolixity appear in the importance of word 
length and essay length — the first being the highest 
correlate with the criterion, the second yielding the high- 
est beta weight. And these relative positions are main- 
tained for the C essays, as well. So that such comparisons 
may be easily made. Table lV-7 contains the prox informa- 
tion for Essay C, the "free" essay. Column A has of course 
the title of the proxes. Column B shows again the correla- 
tion with the criterion. And Column C displays the beta 
weights for the proxes, when all variables are used to 
maximize the prediction of overall human judgment. 

This table (IV-7) has the same status as Table IV-3, 
for the D essays. The D essays were presented first only 
because, historically, they were analyzed first. In fact, 
what they have in common is at once apparent to the naked 
eye. Most of the important correlations with the criterion 
are maintained in Table IV-7, and most of the important beta 
weights have sustained their contributions with the second 
essays . 

Multiple regression . From the standpoint of overall 
simulation, the multiple correlation obtainable for the 
pooled human judgments is the primary goal of the analysis. 
For Essay D, the multiple-R achieved was a rather startling 
.71. And when it was possible to perform the same analysis 
for Essay C, although there were obvious changes as we have 
seen, the resultant multiple-R was once more (coincidentally) 
just .71. This coefficient means that for this set of 
proxes, and for these sets of essays, the correlation be- 
tween the human ratings actually achieved, and the "pre- 
dicted" ratings generated by the discovered beta vector, 
would be .71. Given the looseness of human rating, and 
the pooled human reliability of only .83, the multiple re- 
gression coefficient is encouraging in the extreme. 
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TABLL IV-7 



PHOXES USED TO PREDICT A CRITERION 
OF OVERALL QUALITY (ESSAY C) 



A# 

Proxes 



1, Title present 

2, Av. sentence length 

3, Number of paragraphs 

4, Subject-verb openings 

5, Length of essay in words 

6, Number of parentheses 

7, Number of apostrophes 

8, Number of commas 

9, Number of periods 



TD 



Corr. with 
Criterion 

.03 

-.07 

.08 

-.01 

.25 

-.05 

-.16 

.36 

.01 



10. 


Number of underlined words 


1 

• 

o 


11. 


Number of dashes 


.31 


12. 


No. colons 


.14 


13. 


No. semicolons 


.09 


14. 


No. quotation marks 


.12 


15. 


No. exclamation marks 


—.04 


16. 


No question marks 


.08 


17. 


No. prepositions 


.16 


18, 


No. connective words 


.11 


19. 


No. spelling errors 


-.a 


20. 


No. rjtliStive pronouns 


.01 


a. 


No. subordinating conjs. 


-.06 


22. 


No. common words on Dale 


-.37 


23. 


No. sents. end punc. pres. 


.12 


24. 


No. declar. sents. type A 


-.00 


25. 


No. declar. sents. type B 


1 

• 

o 


26. 


No. hyphens 


.26 


27. 


Nb. slashes 


.03 


28. 


Aver, word length in Itrs. 


.37 


29. 


Stan. dev. of word length 


.45 


30. 


Stan. dev. of sent, length 


.08 



C. 

Beta wts. 



.06 

.09 

..02 

.09 

.03 

..05 

..09 

-.29 

..01 

.07 

-.15 

..06 

.17 

..12 

..09 

..05 

..06 

.10 

.01 

.10 

.25 

.15 

.34 

..05 

.11 

«00 

..03 

.26 

.09 



^er|c 



- 52 - 



As is well known, however, we should not expect all of 
this accuracy if we took new essays and applied the dis 
covered beta weightings to them, to predict their human 
ratings « For any set of scores, or any set of resultant 
correlations, contains not only true variance associated 
with the variable, but also a certain amount of error 
variance, random for the particular subjects concerned, 
which will not ordinarily be found with a new set of human 
subjects, or essays. The true variance gives us informa- 
tion which will be subsequently useful. But the error 
variance is also capitalized upon by the analysis, and a 
certain portion of the multiple-regression coefficient, 
and of the contributing beta weights, will spuriously 
seem to contribute, but will not stand up in a replication. 

When one does run such an analysis, then, and subse- 
quently cross- validates the weightings with new data, the 
resulting predictions will not correlate as highly with 
the criterion as one might hope. The statistical loss is 
commonly spoken of -.as "shrinkage" and has been widely 
treated in the literature (e.g., McNemar, 1962). Fortu- 
nately, empirical cross-validation is not always necessary, 
since the performance of such data may partly be predicted 
mathematically. As one would suppose, the larger the num- 
ber of subjects, the more reliable the multiple-R will be; 
but the larger the number of variables (given the same 
number of subjects) , the less reliable the multiple-R will 

be. 

The Paulus tables. Since our work of essay analysis 
continues to be heavily dependent upon multiple regression. 
Dieter Paulus has made an investigation of the behavior 
of such data, given a varying N of subjects, and varying n 
of variables. Some of his findings are set forth in a 
usable form in Tables IV-8 and IV-9. Table IV-8 shows the 
minimum Multiple R coefficients required for significance 
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TABLE IV - 8 



MINIMUM MULTIPLE CORRELATIONS REQUIRE 
FOR SIGNIFICANCE AT THE .05 LEVEL 

SAMPLE SIZE 

NUMBER 



1 


PREDICTORS 


50 


75^ 


100 


125^ ^ 


150 


175 


200 


250 


300 


1 — 

i 

hi 

c 


5 


.4652 


.3815 


.3308 


.2963 


.2703 


.2509 


.2346 


.2099 


-1916 


5? 

if 


10 


.5898 


.4861 


.4221 


.3788 


.3460 


.3215 


.3008 


.2694 


.2459 


k 

5 


15 


.6828 


.5635 


.4901 


.4426 


.4047 


.3746 


.3498 


.3143 


.2870 


1 

q 

r 


20 


.7565 


.6282 


.5485 


.4942 


.4513 


.4180 


.3925 


.3521 


.3217 


g 


25 


.8207 


.6858 


.5994 


.5388 


.4927 


.4578 


.4290 


.3851 


.3520 


i 


30 


.8751 


.7337 


.6429 


.5778 


.5301 


.4929 


.4622 


.4151 


.3796 


\ 


35 


.9227 


.7790 


.6831 


.6166 


.5641 


.5249 


.4924 


.4415 


.4039 




40 


.9623 


.8189 


.7203 


.6492 


.5958 


.5536 


.5183 


.4648 


.4265 


1 


45 


.9923 


.8568 


.7549 


.6812 


.6261 


.5809 


.5455 


.4897 


.4471 


r 


50 




.8906 


.7875 


.7118 


.6540 


.6074 


.5694 


.5115 


.4684 




.3498 

.3707 



.3875 

.4063 







TABLE IV - 9 



MINIMUM MULTIPLE CCRRELATIONS REQUIRED 
FOR SIGNIFICANCE AT THE ,01 LEVEL 



NUMBER • 50 
PREDICTORS 



75 



SAiffLE SIZE 

100 125 150 



175 200 250 300 



5 

10 

15 

20 

25 

30 

35 

40 

45 

50 



.5312 

.6471 

.7322 

.7996 

.S572 

.9042 

.9444 

.9762 

.9968 



,43^S .3819 



.5382 

.6107 

.6726 

.7257 

.7703 

,8116 

.8488 

.8825 



.4705 

.5345 

.5901 

.6397 

.6802 

.7184 

.7529 

.7852 



.3433 

.4234 

.4837 

.5327 

.5764 

.6134 

.6510 

.6824 

.7125 



.9129 .8151 .7408 



.3135 

.3871 

.U37 

.4893 

.5293 

.5640 

.5966 

.6286 

.6575 

.6829 



.2907 

.3599 

.4114 

.4532 

.4917 

.5254 

.5574 

.5847 

.6113 

.6355 



.2724 .2444 
.3369 .3021 
.3839 .3452 
.4256 .3831 
.4621 .4155 



.4931 

.5247 

.5484 

.5733 

.5958 



.4447 

.4728 

.4933 

.5166 

.5390 



.2231 

.2764 

.3160 

.3502 

.3809 

.4070 

.4319 

.4528 

.4734 

.4943 



400 

.1933 

.2396 

.2741 

.3033 

.3301 

.3520 

.3739 

.3931 

.4122 

.4296 



(at the 5% confidence level) with different n and N. 

The nur.'.ber of predictors is scaled from 5 to 50/ along the 
left hand column, and the number of subjects is scaled from 
50 to 400, along the top. It may be easily seen, then, 
that for the present investigation, where predictors number 
50 and cases just over 250, a multiple-R of about .41 is 
necessary for significance at the .05 level. 

Table IV-9 shows similar requirements for the .01 level 
of confidence, showing that around .44 is necessary to re- 
ject the null hypothesis. These Paulus tables are very 
convenient in dealing with large numbers of such coeffi- 
cients, and seem to be a useful by-product of the present 
research. (For the computational reasoning, see Kelley, 
1947, p. 475j 

There is another familiar problem in interpreting re- 
gression, however, and this one depends on the reliability 
of the criterion. It is obviously impossible to predict 
perfectly a criterion which is itself not perfectly relia- 
ble. And the reliability of a group of human raters ob- 
viously depends on the number of such raters and on their 
inter- judge agreement. As we have seen, the reliability 
of the group of four raters in Wisconsin was .83, and this 
means that about 31% of the variance (1.00- .83 ) would be 
unexplained and indeed "unpredictable." When one is con- 
sidering purely practical predictions for groups that are 
identical, it is reasonable to ignore this handicap. But 
when one is attempting to assess the "true" accuracy of a 
set of predictors, it is more fair to take such criterion 
unreliability into consideration. 

Paulus designed Table IV-10 to do just this task. 

The left column refers to the discovered multiple— R co- 
efficient, and the top headings refer to the measured re- 
liability of the criterion variable. Just by finding the 
appropriate cell of this table, then, one may infer what 
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the discovered correlation might have been if the criterion 
had been perfectly reliable. This table was produced, like 
the two before, from equations programmed by Paulus for the 
time-sharing Console in the Bureau of Educational Research 
at Connecticut. It was based upon the division of the 
multiple-R coefficient by the square root of the reliability 
of the criterion variable (Kelley, 1947, p. 412). 

Still another table serving such needs was designed 
to perform automatic "shrinkage" of multiple-R coeffi- 
cients. As we have noted, when MULTR is calculated, it 
finds a maximxim fit of weightings to the sample data. But 
the sample data do not reflect merely the true covariances 
of the population. They also reflect random error typical 
only of the cases constituting the sample, and the computa- 
tional method capitalizes upon such random error, just as 
it capitalizes upon the true covariance. And such random 
error increases rapidly as n, the number of predictor 
variances, increases. As we have also noted, however, the 
sample size tends to counteract this mounting random error. 
The "shrunken" multiple-regression coefficient, then, is 
the statistical estimate of what the coefficient would have 
been, if it had not capitalized on such random error. It 
is therefore, of course, always smaller than the observed 
coefficient. 

There are several formulas available for such shrink- 
age. Perhaps the most appropriate one is the Wherry for- 
mula (Kelley, 1947, p. 474) , expressed by; 

R^ = (N - 1) R^ - n 

s — 

N - n - 1 

where R is defined as the shrunken coefficient, R is the 
s 

discovered coefficient in the sample, N is the number of 
persons cases in the sample, and n is the number of predictor 
variables. The Wherry formula expresses what is believed 



to be the "true” multiple-R coefficient in the population 
of interest (rather than in some other sample from that 
population) . This formula thus seems most appropriate for 
such exploratory research, where the population parameters 
are indeed of central interest. And it was therefore pro- 
grammed into the computer to produce the various tables 
for shrinkage. 

These tables are listed in Appendix B, since they are 
too large to include conveniently in the running text. In 
Appendix B, the tables are divided according to size of n, 
the number of predictor variables. These sub-tables are 
as follows: 



Organization of Table IV-11 
(See Appendix B) 

Sub-Table Number of Predictors 



A 

B 

C 

D 

E 

F 

G 



25 

30 

35 

40 

45 

50 

55 



To avoid too massive a document, Paulus restricted 
the size of n, therefore, to the range from 25 to 55. He 
also restricted the sample size to a range from 100 to 300. 
Both of these constraints mean that the tables are very 
appropriate for studies of the present size, and a large 
number of empirical studies seem to fall within these 
limits . 

Use of the tables . In the present case, we can 
immediately apply certain of these tables to the discovered 
data. We would ordinarily shrink the MULTR before we would 
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correct it for attenuation; therefore we would enter 
Table IV-ll(B) , with n = 30, N greater than 275, and a dis- 
covered R of .71. The appro p riate cell yields us a shri m- 
ken R of .67 . Then we may enter Table IV- 10 with a new 
"discovered R" of .67, and a known criterion reliability 
of over .80. In the appropriate cell we find a coeffi- 
cient of .75, when first shrunken a nd then corrected f^ 

attenuation . 

In this use as in other uses of such tables, tne user 
should always remember that these table cells are generated 
■from formulae which inevitably make certain theoretical 
assumptions about the distribution of the data. One will 
not necessarily expect, for instance, that cross-validation 
with another sample will match the Wherry shrinkage with 
any exactness. In the first place, violations of assumed 
distributions will often cause a greater shrinkage through 
cross-validation than one would expect. On the other hand, 
the prediction of a new sample will often be, for the 
reasons touched on above, lower than the corresponding pre- 
diction would be of the population itself. But these 
tables can surely supply rather good approximations to the 
statistics which we may be very much interested in, but 
cannot measure directly. 

Reliability of proxes . To some extent, validation with 
subsequent samples will depend upon the reliability of the 
multiple correlation, and this will depend in part upon the 
reliability of the individual proxes. As already noted, 
the reliabilities ofthese proxes are shown in Column D of 
Table IV-3. The coefficients of Column D are the product- 
moment correlations between the different essays for a 
particular prox. For example, for the second prox listed, 
"average sentence length,” the coefficient represents th 
similarity between these averages for two different essays 
written about one month apart. Thus the correlation is an 
extremely conservative one, and seems a reasonable measure 



of “writing behavior" under two separate (but quite similar) 
stimulus situations. 

A few generalizations may be made about the prox 
reliabilities. In the first place, it seems that there 
is a correlation between Column B and Column D. Those 
proxes with the highest reliability are also typically those 
which aid most in the prediction of overall quality. The 
highest reliabilities seem to belong (in descending order 
of magnitude) to: the proportion of common words (#22), 

average sentence length (#2), average word length (#28), 
proportion of commas (#8) , and length of essay in words 
(#5) . With the exception of average sentence length, these 
same proxes are among the best (bivariate) predictors of 
writing quality, and even average sentence length is among 
the more substantial contributors in the combined, multi- 
variate prediction. 

A second generalization about such proxes is that 
their reliabilities may be related to the frequency of 
occurrence. Those proxes which deal with the most frequent 
events, such as average length of word, or proportion of 
common words, may have the highest reliability. Sentences, 
which are also found in a fair number within an essay, have 
a fairly stable reliability for average length (.63). And 
paragraphs, which are less frequent in an essay than sen- 
tences, have a frequency reliability which is somewhat 
lower (.42). On the other hand, the writing of a title, 
which is a behavioral decision which occurs only once in 
the writing of each essay , has a practically non-existent 
reliability. This is a generalization which is still very 
tentative, and deserving of more exploration. 

A third generalization, really a speculation, is that 
there may be a significant interrelationship among the 
reliability of the prox, the beta weight of the prox for 
a particular essay, and the worthiness of the prox for 
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assessing the more stable writing behavior of the student. 
It is worth studying to find out whether prediction of 
future essays may be improved by modifying the beta weights 
in accordance with the reliability of the proxes. This 
possibility has not yet been analyzed within this project, 
but is promising for the future, for practical prediction 
purposes. 

A final interesting speculation concerns the relation- 
ship between the reliabilities of the proxes and the re- 
liability of the total multiple-regression equation. It 
is a familiar observation in mental testing that the total 
score of a test, which is often a sum of various part 
scores, will frequently be more reliable than any of those 
part scores taken separately. But it might not be so 
obvious that the same phenomenon may occur in multiple 
regression, that the total predictive validity may con- 
ceivably be higher than the reliability of any of the con- 
tributing predictors. This appears to be the case here; 
but the mathematical aspects of this problem will not be 
analyzed within the scope of this report. 

Human and machine judgments . Now it would be valuable 
to return to a further analysis of Table IV-1, since it 
has much to tell us about rater performance. Earlier in 
this chapter, it was noted that the upper left quadrant 
(for Columns 1-4) shows us the intercorrelations among 
the judge columns for Essay C. Columns 5 through 10 show 
the increased accuracy, or reliability, which may come 
from increasing the number of judges. 

In this portion of the table, many of the coeffi- 
cients are of course inflated artificially through a part- 
whole agreement. Columns 1 and 5, for example, agree at 
a level of .88, but since Col\imn 5 is simply the sum of 
Column 1 and Column 2, this has little empirical meaning. 

- 62 - 










Whether a coefficient is so contaminated may be at once 
determined by reading the variable names in the leftmost 
column of the table. 

Column 11 represents the sum of all C ratings, and 
therefore the agreement coefficients between 11 and all of 
the earlier columns are similarly a part-whole artifact. 

On the other hand, Columns 11, 12, and 13 do have consider- 
able significance when properly understood. Column 12 
represents the sum of all D ratings, and is therefore the 
best measure of external validity which one could wish for 
the various ratings given by the human judges to the C 
essays. And Column 13 represents the machine evaluation, 
derived from multiple-regression analysis of the proxes 
for Essay D. This information has particular meaning for 
this project, as is here explained. 

Human vs. machine "validity ". An interesting side- 
light is cast upon the human vs. machine by looking at 
some analyses of the human judges of essay C compared 
with human and machine judgments of essay D. The most 
important meaning of "validity", for an essay test, would 
appear to be how well it predicts performance on another 
essay by the same student writers . That is, in the long run 
we are less interested in how reliable this particular 
judgment of performance is, and more interested in how 
well it assesses the student's general writing performance, 
under somewhat differing circumstances. One important 
measure of this validity, then, would be agreement of 
ratings with those of other essays by the same students. 



We would always expect such validities to be lower 
than the agreement between raters on the same essays, for 
not only would the ratings differ because of rater error 
(or viewpoint) , but they would differ also because of 
intrinsic differences in performance of the student under 
two sets of conditions. One interesting comparison of the 
machine and the human judge, then, would be to match each 
with the ratings of the expert group for some second essay. 
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This comparison was simplified for the present study 
because, as we have seen, multiple essays were analyzed 
for the same student writers, the "Free" essays and the 
"Anger" essays. We have seen how the individual judges 
(or their statistical summations) agreed with each other 
in Table IV-1, on the C essays. Now it would be instruc- 
tive to see how well each of those "individual" judges 
predicted the ratings on the D essays. For this compari- 
son they are correlated with those ratings given by the 
group of four raters on the "Free" essays, so that their 
coefficients each represent the correlation of an indivi- 
dual with a group of individuals. And we may therefore 
expect the correlations to be higher than those between 
pairs of individuals, since some of the error (but by no 
means all) will be eliminated by the larger number involved 
in the group sums. The coefficients are also somewhat higher 
then they should be, in one sense, since some of the same 
judges were involved in evaluating both C and D essays. 

When such comparisons are made, the four judges of 
C are found to correlate with the pooled judgment of D 
as folTows : .56, .59, .60, and .42. These coefficients 

produce an average of about . 54 between these two essays. 

On the other hand, we could look at the predictions 
of the D essays generated by prox analysis of these same 
essays, resulting from the multiple regression programs. 

These predictions become a (reasonably) independent way 
of estimating how well the student might do on another 
essay. When these machine predictions of D, then, are 
used to predict the students' actual performance on the C 
essays (that is, the pooled expert judgment of such per- 
formance) , the coefficient is . 53 . This coefficient is 
almost precisely that of the typical human judge, and 
once again shows us how similar to the human individual 
is this first approximation of a machine system. 
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This finding also furnishes another response to the 
critic who supposes that the measures found through such 
statistical procedures are entirely artif actual, and will 
disappear upon validation. There could hardly be any 
measure of validity of essay rating superior to this one, 
and on this measure the machine performs, even in this 
early state, as well as the expert human. 

Human vs. machine accuracy . There are two elements 
of "accuracy" of rating which are submerged in the data, 
and for which there is no readily available statistic in 
common use. Both of these hidden statistics are extremely 
important and, as it happens, both would argue additional 
advantages on the side of machine grading, in almost any 
practical situation. 

When students speak of "fairness" in grading, they 
ordinarily are not speaking primarily of any correlation 
with some true score, as much as they are speaking of 
absolute comparisons with such a true score. As we know, 
the common correlational methods suppress both mean scores 
and score variances , in order to make the comparison on 
standard scores alone. Therefore it would be possible to 
have two human raters "agree" perfectly, in terms of corre- 
lation coefficient, in that r might equal 1.00. Yet they 
might have not one rating in common. This would be the 
case if two teachers assigned the identical rank orders 
to a set of students, yet one assigned marks just one 
grade lower than the other teacher. To the typical student, 
such a question of "hard" or "easy" marking would be much 
more important than minor differences in correlation. 

Another aspect of accuracy or "fairness" to the 
student is the range of marks assigned. The student at 
the bottom is very concerned whether the teacher is one 
who fails students often. And the student at the top feels 
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that it is "unfair" if there is not a reasonable probability 
of his getting an A. If an "accurate" or "fair" grade is 
regarded as that which would be assigned by a group of ex- 
perts, acting independently of one another, then these 
questions of mean grade, and of grade dispersion, are very 
important to such accuracy. 

In this way, of course, the machine can be incomparably 
superior to the human. Both mean grade and grade deviation 
may be determined entirely on the basis of the expert 
group, and if desired remain fixed for any group of students 
for whom the system is applied, regardless of the size of 
the group. We take such standards for granted with the 
national standard scores of such instruments as the Scholas- 
tic Aptitude Tests, yet ordinarily we despair of trying to 
achieve the same fairness with any marks administered on 
a local level. With the introduction of machine essay 
grading, it appears likely that the parameters of evalua- 
tion may be uniformly adjusted to any standards found 
appropriate. 

In the question of accuracy, then, as this quality is 
ordinarily thought of, the automatic system has some large 
advantages over the human system, and these are advantages 
which cannot be easily demonstrated in statistical compari- 
sons. But they should be kept in mind for any thinking 
about applications. 

Using one essay's proxes for another essay’s criterion . 
One way to find out what proxes might have the greatest 
stability, in terms of measuring important aspects of a 
student's characteristic writing behaviors, might be to 
use the proxes from one essay in a multiple regression for 
another essay by the saime students. The reasoning may be 
obvious : There are certain aspects of student behavior 

which might influence the human judgment of one essay, but 






not be much related to the student's long range performance. 
If we cross the proxes and criterion, then, in the way 
described, we may be tapping more enduring aspects of 
writer behavior. 

To investigate this question, the proxes from Essay D 
were used in a new multiple-regression ana3.ysis to predict 
the pooled human judgments for overall quality for Essay C. 
The resulting MULTR was .62, which as would be expected 
was a considerable drop from the .71 obtained with Essay 
C's own proxes. Table IV-12 shows the summary data of 
interest from this analysis. Column B represents the cor- 
relation of the proxes with the criterion, and Column C 
represents the beta weighting of each prox in the analysis. 

Inspection of these columns, and a comparison of 
them with their counterparts in Table IV-7 and Table IV-3, 
do not provide any very transparent explanation for the 
decrement in prediction. A hint may be gained from the 
slightly lower correlation of essay length with the cri- 
terion; students may have more to say on one subject than 
on another, and this fluency may affect the rater's judg- 
ment. And the beta weight for essay length has also dropped 
markedly (from .32 for Essay D's own criterion, to .21 for 
Essay C's criterion) , which bolsters this suggestion. A 
comparison of another contributor, standard deviation of 
word length, shows a similar decrement in beta weight, but 
an actual increase in the bivariate correlation with the 
criterion, compared with the Essay D table (IV-3) . 

In summary of this trial, then, the data are difficult 
to interpret verbally, but seem to argue that the . decrement 
in multiple correlation may be a reflection of the true 
difference in student performance across essay topics. 




TABLE IV-12 



iSSAY D PROXES USED TO PREDICT 
AN ESSAY C CRITERION 





A. 


B. 


C. 




Proxes 


Corr. with 
Criterion 


Beta wts 


1. 


Title present 


.03 


.07 


2. 


Av. sentence length 


-.01 


-.22 


3. 


Number of paragraj^s 


.08 


.02 


4. 


Subject-verb openings 


-.14 


.02 


5. 


Length of essay in words 


.19 


.15 


6. 


Number of parentheses 


.11 


.05 


7. 


Number of apostrophes 


-.19 


-.05 


8. 


Number of commas 


.37 


.18 


9. 


Number of periods 


.00 


-.09 


10. 


Number of underlined words 


-.03 


-.07 


11. 


Number of dashes 


.22 


.06 


12. 


No. colons 


.08 


.04 


13. 


No. semicolons 


•04 


-.00 


14. 


No. quotation marks 


.17 


.07 


15. 


No. exclamation marks 


-.07 


-.05 


16. 


No. question marks 


-.08 


-.U 


17. 


No. prepositions 


.17 


.02 


18. 


No. connective words 


.17 


.02 


19. 


No. spelling errors 


-.09 


-.02 


20. 


No. relative pronouns 


.04 


.06 


21. 


No. subordinating. con js. 


-.13 


.04 


22. 


No. common words on Dale 


*».A4 


-.09 


23. 


No. sents. end punc. pres. 


.01 


.04 


24. 


No. dedar. sents. type A 


.08 


.01 


25. 


No. declar. sents. type 6 . 


.06 


.02 


26. 


No. hyphens 


.24 


.14 


27. 


No. slashes 


-.01 


.01 


28. 


Aver, word length in Itrs. 


.45 


.10 


29. 


Stan. dev. of word length 


.48 


.21 


30. 


Stan. dev. of sent, leogth 


%.04 
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Cross-validation with same essays * As we have already 
suggested/ the question of validity is a complicated one. 
One acceptable form of validity is surely the prediction of 
future behavior by the same student. But what is often 
meant is rather the prediction of what expert humans might 
say about a student's performance. This would become a 
kind of "concurrent" validity. 

In the context of the present project, such concurrent 
validity would consist of seeing to what degree the. machine 
scorings of the essays would coincide with the human scor- 
ings. In one sense, we already know the result to be .71, 
since this was the discovered multiple correlation for both 
C and D essays, and represents just what is described: 
a measure of correlation between the machine scores and the 
hximan scores. As we have seen, however, such a coeffi- 
cient capitalizes upon chance, and should be shrunken 
statistically. This has also been done, with a resultant 
shrunken (Wherry) coefficient of .67. 

Even the shrunken coefficient is not completely satis- 
fying, however, because of the fact that empirical data 
often deviate from the assumptions upon which such statis- 
tical manipulation is based. Besides, it is desirable to 
know how the machine algorithm will correlate with the 
individual judges. 

For these reasons, it is most desirable to select 
randomly among the essays, and generate the weightings from 
this sub-sample, after which the weightings may be used 
to assign scores to those essays not included in the multi- 
ple-regression analysis . The correlation of these machine 
scores, may be correlated with the human ratings of these 
excluded essays, and this new correlation will represent a 
very appropriate measure of validity. The result of such a 
procedure is exhibited in Table IV-13. 














w O O txJ >- 



TABLS IV-13 

CR 03 S-VALIDATI(»f COMPARISON OF 
THE COMPUTBR WITH POUR HUMAN JUDGES 
(Essay N = 13^) 



Judges 




Note: Judge C is the computer. All cells represent correlation 

coefficients generated by comparing four human jiidge columns 
with machine scores on the same essays. The machine scores were 
those generated frco 138 other essays written by oUier students. 



chosen at random from the same larger sample. 



For this table, the computer-assigned scores, then, 
were generated from an analysis on 138 of the D essays, 
which were chosen by random methods from the 276 total 
sample. The weightings derived from the analysis were 
then applied to essays written by 138 other students, and 
the scores so assigned were correlated with the scores 
assigned by four human judges (Page, 1967a) . 

This Table IV- 13 has often been presented to audiences 
as the clearest and simplest evidence of the effectiveness 
of machine strategies, and it has usually been presented 
without telling the audience which column in fact repre- 
sents the computer. It is very difficult to guess which 
one it would be, yet given sufficient time, the occasional 
sophisticated psychometrician may be able to reason out 
that Column C is the most probable, and is indeed the 
computer colximn. The reason why this is detectable is again 
characteristic of the difference between man and machine. 
Surely the machine is not measuring the essay quality in 
the same way as the man. The machine is surely failing to 
attend to many of the important syntactic and semantic 
properties which influence the human judge. But the machine 
is in one sense more reliable than the human judge, and it 
is the reliability which gives it away: The coefficients 

for the machine (Column C) range from only .48 to .53, 
whereas the coefficients for the human judges (Coliainns A, 

B, D, and E) have ranges which are typically three times 
as large. The machine agrees with the hiaman judges more 
consistently, then, than they agree with each other! 

Practical implications . Although striking. Table IV- 
13 does not merely represent a simple trick. Rather, it 
is the clearest analog so far to what might come from 
a large-scale, machine-based essay evaluation. For 
example, in a national essay quiz (such as the writing 
Scutiple occasionally taken by the College Entrance Exeunina- 
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tion Board) , the procedures would probably be quite similar 
to those which led to this table. In prior years, a number 
of essay topics would each be assigned to a fairly small 
test sample of students. Their writings would be inten- 
sively analyzed by expert humans for whatever research and 
norming purposes might be desired. Then from this pool of 
tested essay assignments, one stimulus would be chosen to 
be used across the country on the day of the major test. 
When collected, these essays would typically not be exa- 
mined by human judges at all, but would rather be analyzed 
by computer programs already developed from the test 
sample. Only a few would subsequently be analyzed by 
human beings, to check for possible drift in sample, or for 
historical developments which might have altered the. essay 
topic. In general, the crash scoring, in this hypothetical 
testing, would typically be entirely mechanical. 

Summary . Earlier chapters introduced the rationale, 
basic design, and initial proxes used in this study, and 
have presented the computer program used in their measure- 
ment. The present chapter has presented some of the find- 
ings from the study bearing on the basic questions of the 
agreement of the human judges with each other, and with 
machine scores of the same and of different essays'. It 
has furthermore presented information about the proxes: 
their intercorrelation, their prediction of human judgments, 
and their reliability across trials. In most important 
comparisons, the machine scores were found to be practi- 
cally interchangeable with the human ratings . This find- 
ing was most important when various types of validity were 
analyzed, one .based upon prediction of student performance 
on another occasion, and the other applying measures 
generated from one set of students to a wholly different 
set of students. Some comment was also made about infer- 
ences from these findings for practical work in the future. 



CHAPTER V 



PREDICTING A PROFILE OF RATINGS 

The last chapter explained in considerable detail 
the results from the attempts to predict the overall, 
rating of some sets of essays. It was seen that the simu- 
lation was indeed very successful, and that the level of 
success had a number of implications for the future of 
such work. This chapter considers the more advanced case 
of simulating an analytic profile of an essay. 

If a single overall rating were the only outcome 
from analysis, it would be a satisfactory substitute for 
some major tasks today, such as national essay exams 
(where a single pooled judgment is the usual product of 
evaluation) , or many classroom situations (in which an 
overworked teacher marks only a letter grade and some 
redundant comment on a returned essay) . Nevertheless, any 
substantial essay analysis must seek a level of performance 
nearer to that of the ideal teacher: with a much richer 

profile of the traits of the writing, so that students 
(and their instructors) may concentrate differentially 
on relatively weak skills in the profile; and with more 
detailed and direct comment about specific patterns or 
errors in the student's work. This chapter will concen- 
trate on the trait ratings of the essays, and Chapter VIII 
will give some attention to the detailed and personal 
comment to the student. 

The sample . For the reasons set forth in an earlier 
chapter, there was no cause for dissatisfaction with the 
Wisconsin essays (McColly and Remstad, 1963) . They did 
not represent a typical high school student body, but the 
range was wide, and, with such early strategies, no impor- 



tant interactions with selections were expected. Further- 
more, a number of replications had been successfully per- 
formed with other essays, with second essays written by 
the same students, and with a set of essays written by 
students in Indiana in an unrelated study conducted by 
Anthony Tovatt and his colleagues at Ball State University 
under the U. S. Office of Education (and reported else- 
where by Dr. Tovatt) . For the phase of work considered 
in this chapter, therefore, it was decided to continue the 
analysis, this time working most intensively with Essay C 
(those based upon the question of whether "the best things 
in life" were really "free") . 

The new data needed were judgmental, then, for no 
evaluative data existed for the Wisconsin essays beyond 
the simple rating of overall quality. We therefore wished 
to establish a reliable set of ratings which would consti- 
tute a sensible descriptive profile of the strengths and 
weaknesses most commonly looked for in stylistic judgment. 
For such a requirement we would need: (1) a set of 

established and accepted dimensions; (2) a selection of 
judges who would be representative of qualified English 
teachers in general; (3) a sufficient number of judg- 
ments to overcome the inevitable halo effects, and to esta- 
blish in truth a meaningful profile. 

Just as with the essays, the investigators could 
afford to be reasonably relaxed about any randomness of 
selection, so long as judges met the general, personal and 
professional criteria, because stratification of region, 
type of school, and a myriad other possible considerations 
seemed unimportant so far as these particular generaliza- 
tions of result are concerned. While there are^ differ- 
ences among teachers in such dimensions, interaction of 
such dimensions with the purposes of the study seemed of 
negligible importance. And what seemed of much greater 
importance was the control of the rating situation. 



ThG rating sGssion * On July 16, 1966, thGn, undGr 'the 
principal supGrvision of Dr. Arthur Daigon, 32 English 
tGachGrs niGt at thG School of Education of thG University 
of Connecticut for the purpose of grading student coiuposi— 
tions for multiple traits. Because this group's judgments 
of student writing were to represent the evaluations of 
highly competent professionals, evaluations which would 
subsequently be simulated by a computer, selection of 
participants (setting randomness aside) was done with 
considerable care. 

Ten chairmen of English departments in Connecticut 
secondary schools were invited to participate with teachers 
whom they could recommend as having special competence in 
the grading of student compositions, and who had at least 
three years of teaching experience. The department chair- 
men were also requested to give first preference to those 
skilled teachers who possessed master's degrees. 

Of the 32 teachers who participated, 10 (31 %) were 
department chairman and 28 (87 %) possessed M.A. degrees. 
The mean number of years of English teaching experience 
was 12.9 years, the median, 10 years. 

Before the grading session began, the teachers were 
welcomed and acquainted with their task. Each would grade 
64 compositions, assigning separate grades on a 5 point 
scale for each of 5 traits designated as "ideas or content" 
"organization", "style", "mechanics", and "creativity". 

Each English teacher- judge received both written and oral 
instructions relating to identification and scoring of the 
traits. Samples of a "good" composition and of a "poor" 
composition were distributed and considered in order to 
demonstrate how the traits could be scored and to suggest 
a range of possible response. 
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Eight 30-minute time periods were established. During 
each period each judge evaluated 8 compositions, which 
allowed about 3-1/2 minutes for the multiple trait judging 
of each composition. These arrangements permitted 8 judg- 
ments for each of 5 traits in each of the total of 256 
compositions . 

The assignment of essay to teacher and period was a 
formidable task, which required the computer. The problem 
was manifold: each essay could be assigned only once in 

each of the eight periods, so that there would be no 
important period effect hurting the essay evaluations, and 
so that we would not need multiple copies of the essays. 

Yet no essay could be given to any judge more than once. 

And it was also desirable to randomize the order of pre- 
sentation within a period. These problems are rather 
easily solved if groups of eight essays are kept together, 
but such a procedure would obviously distort the evalua- 
tion of an essay in unknown ways. There is another major 
problem, in that it is easy to continue random assignment 
up to the last period, and find unresolvable conflicts of 
assignment, requiring branching back to some earlier point. 
But eventually, with intensive work, the problem was 
solved and made completely automatic. 

The mechanics of the rating day were not a trivial 
problem, however, since we needed six graduate assistants 
performing the reassignments. Luckily, though, they could 
use punched and interpreted assignment cards, a by-product 
of the computerized assignment program, which also served 
as mark-sense rating records for later analysis. 

The rating criteria . In choosing the traits for 
rating, we desired well-established dimensions of writing 
quality. One of the most helpful documents was an eight- 
scale evaluation designed by Paul Diederich, and used at 
the Educational Testing Service ("Definitions of Ratings 




on the ETS Composition Scale" — no date) . Figure V-1 
shows our adaptation of such suggestions. 

Naturally, there are large differences among raters 
in their evaluations of the same essays. Since each 
rater read 64 of the essays, and since this number repre- 
sents one fourth of the 256 used in the design, the proba- 
bility of any one essay being read by any judge is clearly 
1/4, and we might expect that, of the 64 essays read by 
judge A, about 1/4, or about 16, would be read by judge B. 
Clearly, with such small N's, we would not expect very 
secure estimates of the population agreement between two 
judges, but would rather expect a large random error. 

Table V-1 shows the intercorrelations among all 32 
judges. The correlations are based upon the "total" 
scores, which were the average of the five trait ratings 
given by any judge to an essay. As can be seen, the 
median judge intercorrelation hovers around . 5 for these 
total scores. 

The judges were instructed, as is clear in Figure V-1, 
to balance their ratings into a certain distribution, 
approximately normal, and it would be expected that their* 
means and standard deviations would therefore be approxi- 
mately equal. Table V-2 shows that this is indeed the 
case. Since 5 represents the best rating, and 1 represents 
the worst, the means are all seen to deviate from the ex- 
pected 2.5 in a slightly generous direction. The nebulous 
"ideas" or "content" is the most tolerantly graded, with 
"organization" a second place. "Mechanics" has a middle 
position of severity of marking, and has decidedly the 
largest standard deviation of any trait. Teachers were 
thus more decisive about mechanics and, as we shall see 
later, they agreed more with each other about mechanics 
than they did about the other dimensions of essay quality. 



FIGURE V-1 



CRITB21IA FOR RATING THE ESSAYS 

!• Definitions of the basic traits to be rated* 

A* Ideas or Content * The quantity and quality of the materials 

used to cover the subject* 

The relationship between the parts of the 
paper and the whole* 

The use of language above and beyond the 
problems of mechanics* 

Spelling^ grammar^ usage> punctuation# 
capitalizatiai# numbers* 

The degree to which the paper finds a new# 
unexpected# yet fruitful way to approach 
the subject# to combine ideas# and to 
utilize language* An over-all trait* 

II* Guides for rating the basic traits* 

A* IDEAS OBL CONTiMT 



B* Organization * 

C* 

D* Mechanics * 

£* Creativity * 



The student covers the materials that the topic and plan 
of attack clearly call for* His understanding of the 
subject is good and he usee clear definitions* He has 
the ability to see the topic in a broader perspective 
than do the other students in his group# that is# he 
brings a broader e^q)erience to the topic* 

Middle * The ideas are appropriate# but conventional and few in 
number* Some aspects of the topic are left out* The 
writer does not seem to have a well-stocked mind* 

« 

Low * The student emits many important aspects of the topic* 

He seems to have no store of knowle^e to bring to bear 
on the topic and consequently repeats a few simple ideas 
over and over again* 

B. C^ANIZATION 

Ig^* The student has a definite plan for discussing the 
assigned topic* If he is arguing for or against an 
idea# he preseirts relevant reasons in an effective 
order* If he is describing something# he does so 
according to seeks gchmue (top to bottom# order of im- 
portance# order of complesdty# etcl) If the student is 
explaining a concept or process he uses a coherent plan 
of analysis# or definition# or illustration* The student 
has a good sense of what is relevant to his plan and 
avoids repetition* He shews a sense of proportion in 
treating the various parts of his essay* 
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FIGURE V-l (cont.) 



Middle. The student shifts his plan of discussion, or Intr^uces 
irrelevant material, or spends too much tine on unimpor- 
tant things, or repeats himself. He develops the assign- 
ed topic by ftee association (what ccmes to my mind when 
I think of Hawaii?) rather than by working toward a 
definite purpose. 

Low. The student does not seem to have given any thou^t to 
vdiat he intended to say before he started to write. He 
offers no plan of discussion. The paper seons to. start 
in one direction, then another, then another, until the 
reader is lost. The main points are not clearly separated 
from one another, and they come in a random order. 

C. STYLE 

(There are many aspects of style that may enter into a ratix^— 
individuality, vividness, elegance, etc. However, for the purposes 
of this experiment we are interested in three stylistic traits 
only— clarity, variation, and range of linguistic resources.) 

High . The student uses language in a way that makes comprehen- 
Sion of the paper easy. He uses appropriate words in 
their nozmal sense. He puts the words in their noimal 
order. He is careful to signal his transitions. He 
avoids ambiguity and he does not frustrate the reader *s 
e3q>ectations. At the same time the student avoids 
monotonous repetitions of words, phrases, and sentence 
structures. Finally, he reveals a cenmand of a good 
range of linguistic resources. His vocabulary is good, 
he uses parallel structures, he makes subtle use of 
subordination, and so on. 

Middle. The student occasionally brings the reader up short by^ 
choosing a bizarre, inappropriate word or phrase, or by 
introducing a distracting metaphor, or by misplaci ng a 
modifying phrase or clause, or by making unexpected 
transitions. The repetitions of words, phrases, and 
sentence structures become monotonous. The resources of 
language are limited. The writer is addicted to tired 
old phrases and hackneyed expressions. 

Low. Vague use of words. Ambiguous references. Awkward con- 
structions. Childish vocabulary and sentence structure. 
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Middle. 



Low. 






Middle. 



Lovr. 



FIGURE V-1 (cent.) 

D. MECHANICS 

The sentence structure is usually correct, even in varied 
and complicated sentence patterns. No violation of 
established spelling rules. Even the hard words are 
usually spelled correctly. No serious violations of the 
rules of punctuation, capitals, abbreviations, and nunbers. 

An occasional syntax problem. Hard words are occasionally 
mispelled. Some violations of the rules concemipg 
punctuation, etc. 

The student borders on the illiterate. 

E. CREATIVin 

The student surprises us with a new and fruitful way of 
looking at the problem. He brings to bear new data in 
treating the topic. He finds a fresh and interesting way 
of using language that illuminates his ideas. 

The student thinks of the eapected things. He treats 
them in a way that most people would treat them. He 
makes use of ordinaxy exprc^ssions and sentence stJtictures. 

The student works with cliches of thought and expression. 
Does not go beyond the most superficial treatment of the 
subject. Repeats foimulas without really grasping their 
meaning. 



Try for the following overall balance 

RATING 

5 ^ 1 % or so. About 2 of each 16 essays. 

4 MTOtT TOP About 4 of each 16 essays. 

3 MIDDLE 35Jt. About 6 of each 16 essays. 

2 Next BOTTOM 25^. About 4 of each 16. 

1 BOTTOM 1 % or so. About 2 of each 16. 
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TABLE V-2 



MEANS AND STANDARD DEVIATIONS OF THE 
FIVE TRAITS AND THEIR AVERAGE 



Standard 



Trait 


Mean 


Deviation 


Ideas 


3.068 


• 640 


Organisation 


2.950 


.675 


Style 


2.827 


.619 


Mechanics 


2.869 


.771 


Creativity 


2.833 


.641 



6. Trait average 



2.909 



.610 



Contribution of the proxes . It is of interest to see 
how each prox contributed to the prediction of the five 
various traits, and this information is contained in the 
next five tables (V-3 to V-7) . The information for the 
average of the five traits is contained in Table V-8. The 
first column has the number of the proxes and, in the first 
of th^«o tables (V-3) the name of the prox as well. Column 
B has the correlation with the criterion for each prox. 

Column C has the B-weights for each prox. And Column D 
has the computed t- values for each prox. 

Column C, which has the B-weights for each prox, should 
not be confused with the Beta weights given, for example, 
in Table IV-3 of the last chapter. The B weight is the 
coefficient which is actually used, together with the raw 
prox score, to optimize the predictive value of any prox 
in an applied situation. In other words, given two proxes 
of the same Beta coefficients, the one having a larger 
standard deviation will have a smaller B- weight. While these 
B-weights may not be compared directly with those Beta co- 
efficients given in the last chapter, they may be compared 
with the corresponding B-weights of the other traits given 
in this chapter, though any such comparison would be simply 
monotonic, and differences could not be easily compared be- 
tween proxes . 

The relative contribution of each prox may be inferred 
from the t- values in Column D. The absolute values of 
these t's are monotonically related to the rank order of 
contribution of the proxes to the prediction. For example. 
Table V-3 shows that the highest contribution was made by 
fifth prox, showing a t-value of 6.38, far ahead of any 
other. When it is considered that Prox #5 is "length of 
essay," and that the trin is "ideas or content," we are 
struck by the obviousness of the relation. The more words 
used, the more content the essay is believed to have. For 
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TABLE V-3 



PROX COJiTitlBUTION TO THE FRBDICTICM OF 
IDEAS OR CONTEWr 
(N * 256) 



1 . 

2. 

3. 

4. 

5. 

6 . 

7. 

S. 

9* 

10 . 

11 . 

12 . 

13. 

14. 

15. 

16 . 

17. 

18 . 

19. 

20 . 

21 . 

22 . 

23. 

24. 

25. 

26. 

27. 

28 . 

29. 

30. 



A. 

Praxes 


B. 

Gorr. with 
Criterion 


C. 

B wts* 


D. 

F-value 


Title present 
At. sentence length 
Number of paragraphs 
Subject-verb openings 
Length of essay in words 


.01 

-.07 

.U 

-.19 

.37 


.U973 

-.00341 

-.03357 

-.00326 

.00205 


1.37 
-1.73 
-1.79 
-1.38 

6.38 


Number of parentheses 
Number of apostro|^es 
Number of comnas 
Nwber of periods 
Number of underlined words 


-.04 

-.11 

.37 

.03 

-.04 


-.00875 

-.00640 

.00601 

-.00007 

-.01053 


-1.23 

-1.73 

3.73 

-O.Ql 

-1.14 


Number of dashes 
No. colons 
No* semicolons 
No. quotation marks 
No* exclamation marks 


.36 

.07 

.13 

.15 

-.05 


.03283 

•02032 

•00964 

.0Qfia5 

«.00272 


4.63 

1.03 

0.90 

1.25 

-1.26 


No. question marks 
No* prcpositi<ms 
No. connective words 
No. spelling errors 
No. relative pronouns 


.08 

.17 

.09 

-.10 

-.07 


-.00121 

.02926 

.00797 

-.05672 

.03506 


-0.56 

1.54 

0.12 

-0.55 

1.09 


No. subordinating conjs. 
No* common words cm Dale 
No. sents* end punc. pres* 
Nb* deelar. sents. type A 
No. deelar* sents. type B 


-.09 

-.34 

.17 

—.00 

.04 


.02<)55 

-.00270 

.03073 

-.01298 

-.01483 


0.88 

-0.22 

1.33 

-0.60 

-0.52 


No. hyphens 

No. slashes 

Aver, word length in Itrs. 
Stan* dev* of word length 
Stan* dev* of sent* length 


.23 

.05 

.34 

.43 

.11 


.00717 

.05393 

-.00094 

.00921 

.00200 


0.99 

0.66 

-0.34 

3.17 

1.38 


Intercept constant 
Multiple correlation 
Std* error of estimate 


-1.01123 

0.72301 

0.47093 





F B&lltrP • 8.21 
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TABLE V-Jf 



PROX CONTRIBUTION TO THE PRIDICTION OF 
ORGANIZATION 
(N » 256) 



A. 


B. 


C. 


D. 


Proxes 


Corr. with 
Criterion 


B wts. 


F-value 


1. 


.01 


.10786 


0.82 


2. 


-.11 


-.00229 


-0.96 


3. 


.10 


-.01289 


-0.57 


4. 


-.20 


-.00343 


-1.21 


5. 


.23. 


.00124 


3.20 


6. 


-.08 


-.01417 


-1.65 


7. 


-.08 


-.00253 


-0.57 


8. 


.31 


.00515 


2.65 


9. 


.08 


.00399 


0.53 


10. 


-.06 


-.01267 


-1.14 


11. 


.26 


.02136 


2.50 


12. 


.07 


.02091 


0.88 


13. 


.10 


.00449 


0.35 


14. 


.18 


.00035 


2.44 


15. 


-.03 


-.00287 


-l.U 


16. 


.08 


-.00199 


-0.78 


17. 


.12 


.01855 


0.81 


18. 


.13 


.05610 


0.72 


19. 


-.16 


-.24534 


-1.98 


20. 


-.07 


.03517 


0.91 


21. 


-.10 


.01539 


0.38 


•22. 


-.33 


-.00580 


-0.39 


23. 


.17 


.04423 


1.59 


24. 


-.01 


.02411 


-0.95 


25. 


.06 


-.02201 


-0.64 


26. 


.20 


.00527 


0.60 


27. 


.02 


.01598 


0.16 


28. 


.33 


.00082 


0.25 


29. 


.38 


.00695 


1.99 


30. 


.05 


.00118 


0.68 



Int«rc«pt Constant 
Multiple correlation 
Std* error of estimate 
F fflultr. - 4*55 



• 1.29180 

0.61455 

0.56709 



table V-5 

PROX CCNTRIBUTICN TO THE PRELICTIOi OF 

STILE 
(N * 256) 



A. 

Proxes 



1. 

2. 

3. 

4« 

5. 



6. 

7. 

a. 

9. 



i 

1 


10. 


% 


u. 




12. 




13. 


i 

1 


u. 




15. 




16. 




17. 




la. 


y 


19. 


i 

1 


20. 




21. 


1 


22. 


i 


23. 


1- 


24. 


1 

?■ 


25. 


r 


26. 


i 

p 


27 • 


f 

% 


2a. 


1 


29. 




30. 



B. 

Corr* with 
Criterion 

-.02 

-.12 

.07 

-.17 

.23 

-.02 

-.07 

.41 

.6a 

-.05 

.32 

•06 

.13 

.16 

-.01 

.11 

.19 

.15 

-.15 

-.06 

-.09 

-.39 

.la 

-.04 

.05 

.32 
. .05 

•40 
.47 
.11 



C. 

B wt8« 



.10535 

-.003a5 

-.03715 

.00067 

.00123 

-.00505 

-.00254 

•00624 

.00316 

-.QL147 

.02669 

-.00236 

.00880 

.00019 

-.00329 

.man 

.U271 

-.20081 

.05U9 

.04533 

-.00476 

.04894 

-.03366 

-.02750 

.01625 

.05503 

-.00044 

.00906 

.00315 



D. 

F-v*lue 



1.01 

-2.04 

-2.08 

0.30 

3.99 

-0.74 

-0.72 

4.05 

0.53 

-1.30 

3.97 

-0.13 

0.86 

1.64 

-1.60 

- 1.35 

2.28 

1.82 

-2.04 

1.66 

1.41 

-0.40 

2.22 

-1.62 

- 1.00 

2.34 

0.70 

-0.17 

3.26 

2.28 



Int*re«pt eonstent 

Hultlple eorrelktlo. 

Std. »rror of oBtl*** 

P nultr. « 8.53 
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-1.33964 

0.72951 

0.45042 
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TABLE V-6 



PROI CONTRIBUTION TO THE PREDICTION OF 

MECHANICS 
(N « 256) 



A. 


B. 


c . 


oxes 


Corr. with 
Criterion 


B 


1, 


-.00 


• U 647 


2. 


-.17 


-.00258 


3, 


-.01 


-.02655 




-.08 


.00380 


5. 


.06 


.00053 


6. 


-.02 


-.00483 


7. 


-.07 


-.00234 


8. 


.29 


.00579 


9, 


.16 


.00684 


10. 


-.06 


-.00745 


n . 


.23 


.02192 


12. 


.12 


.03514 


13* 


.10 


.01353 


14. 


.09 


.00017 


15. 


1 

. 

o 


. ocao 6 


16. 


.04 


,00288 


17. 


.17 


.06407 


18. 


.14 


.17301 


19. 


0 

. 

1 


-.68565 


20. 


8 

. 

1 


,.01675 


21. 


-.06 


.04182 


22. 


to 

CM 

. 

1 


.03763 


23. 


•21 


.00580 


24. 


.(f! 


.02471 


25. - 


—.01 


.00433 


26 . 


CM 

• 


.02133 


27. 


.08 


.10975 


28. 


.39 


.00535 


29. 


.42 


.01128 


30. 


•00 


.00136 



D. 

F-»value 



0.$3 

- 1.02 

-l.U 

1.26 

1.29 

-0.53 

-0.49 

2.d0 

0.85 

-0.63 

2.42 

1.39 

0.99 

1.07 

0.39 

1.05 

2.64 

2.09 

- 5*21 

Q.41 

0.97 

2.36 

0.20 

0.89 

0.12 

2.30 

1.05 

1.54 

3.04 

0.74 



Int#pc#pt conittnt -9*53911 
Multlpl* copp«lAtioii 0.67796 
Std. •rror of ootiaato 0.60314 
F attltr. « 6.38 
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TABLE V-7 



PROX CONTRIBUTIOl TO THE PREDICTION OF 
CREATIVm 
(N « 256) 



A. 


B. 


c. 


Proxes 


Corr. with 
Criterion 


B wts< 


1. 


o 

. 

1 


.13550 


2. 


-.12 


-.00037 


3. 


.09 


-.05943 




-.17 


-.00130 


5. 


.34 


.00208 


6. 


-.08 


-.01716 


7. 


.00 


-.00026 


8. 


.39 


.00648 


9. 


.11 


.01684 


10. 


-.06 


-.01797 


11. 


.29 


.02680 


12. 


.09 


.01817 


13. 


.16 


.02121 


14. 


.12 


.00015 


15. 


.00 


-.00363 


16. 


.10 


-.00295 


17. 


.13 


.02480 


18. 


.07 


.02156 


19. 


-.11 


-.09013 


20. 


-.07 


.04230 


21. 


-.08 


.06215 


22. 


-.32 


-.01342 


23. 


.15 


.03678 


24. 


-.05 


-.03993 


25. 


.04 


-.04165 


26. 


.30 


.01846 


27. 


.05 


.06153 


28. 


.28 


-.00322 


29. 


.36 


.00820 


30. 


.10 .00232 
Intercept constant 1.21571 

Multiple correlation 0.70938 

Std. error of eetiinate 0.48074 

P aultr * 7.60 



F-value 



1.21 

-0.19 

-3.11 

-0.54 

6.36 

-2.36 

-0.07 

3.94 
2.64 

-1.91 

3.70 

0.90 

1.94 
1.18 

-1.65 

-1.35 

1.28 

0. 33 

- 0.86 

1.29 

1. a 

- 1.06 

1.56 

-1.80 

- 1.42 

2.49 

0.74 

- 1.16 

2.77 

1.57 
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TABLE V-6 



PROX CONTRIBUTICN TO THE PRJDICTION OF 
AVERAGH) RATING ACROSS 5 TRAITS 
(N « 256) 



A. 


B. 


C. 


D. 


Praxes 


Corr. with 
Criterion 


B wts. 


F-value 


1. 


-.00 


.12299 


1.18 


2. 


1 

b 


-.00250 


-1.33 


3. 


.08 


-.03392 


-1.90 


4. 


-.18 


-.00071 


-0.31 


5. 


.26 


.00143 


4.65 


6. 


-.05 


-.00999 


-1.47 


7. 


-.07 


-.00281 


-0.80 


8. 


.38 


.00593 


3.86 


9. 


.10 


.00615 


1.03 


10. 


-.06 


-.01202 


-1.37 


11. 


.32 


.02596 


3.84 


12. 


.09 


.0184!^ 


0.98 


13. 


.U 


.01154 


1.13 


14. 


-.15 


.00020 


1.75 


15. 


-.03 


-.00229 


-1.11 


16. 


.09 


-.00121 


-0.59 


17. 


.17 


.03559 


1.97 


18. 


.13 


.07427 


1.20 


19. 


-.19 


-.25574 


-2.61 


20. 


-.08 


.03609 


1.17 


21. 


-.09 


.03885 


1.21 


22. 


-.36 


.00219 


0.18 


23. 


.20 


.03330 


1.51 


24. 


-.01 


-.01733 


-0.83 


• 

CM 


.04 


-.02033 


-0.74 


26. 


.28 


.01370 


1.98 


27. 


.06 


.05924 


0.76 


28. 


.38 


.00031 


0.12 


29. 


.45 


.00894 


3.23 


30. 


•08 .00200 
Intercept constant -2.39336 

Noltiple correlation 0.72145 

Std. error of estiaate 0.44944 


1.45 



F mltr * 8.14 
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no other trait is essay length quite so dominant, though 
it plays an almost equal role for Table V-7, where "crea- 
tivity" is the trin. When we consider how often creati- 
vity is measured by tests of fluency and fecundity, we 
are again struck by the obviousness of the relation. 
Similar comparisons can be made for other traits and other 
proxes . 

If we rank order the top five contributors for each 
trait separately, the results are interesting; For the 
trait of ideas , we find the proxes, according to the abso- 
lute value of Column D, to contribute in the following 
order; 

1st) length of essay 

2nci) frequency of dashes 

3rd) frequency of commas 

4th) standard deviation of word length 

5th) number of paragraphs 

For the trait of organization , the order of contri- 
butory proxes is ; 

1st) length of essay 

2nd) frequency of commas 

3rd) frequency of dashes 

4th)- frequency of quotation marks 

5th) standard deviation of word length 

For style 

1st) frequency of commas 

2nd) length of essay 

3rd) frequency of dashes 

4th) standard deviation of word length 

5th) frequency of hyphen 
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For mechanics: 



1st) 


spelling errors 






2nd) 


standard deviation of 


word 


length 


3rd) 


proportion of prepositions 




4th) 


frequency of cpimtias 






5th) 


frequency of dashes 






creativity: 






1st) 


length of essay 






2nd) 


frequency of commas 






3rd) 


frequency of dashes 






4th) 


number of paragraphs 






5th) 


standard deviation of 


word 


length 



And for the average of all five traits y calculated 
for each essay, the order is as follows: 

1st) length of essay 

2nd) frequency of commas 

3rd) frequency of dashes 

4th) standard deviation of word length 

5th) spelling errors 

There is obvious noise in any comparisons of such 
listings. In the first place, there is random error, 
which is considerably higher in calculating Beta weights, 
or these similar multivariate t-values, than in calculating 
bivariate relationships. In the second place, in the trins 
themselves there is a high degree of halo effect, as will 
be shown soon. We would expect the first of these problems 
to be exhibited in rather wild and unexplained loadings, 
that would not necessarily be replicated in cross-valida- 
tions. We would expect the second problem to be evident 
in the occurrence of some common proxes in all lists, as 
here we see word length to be an important correlate of 
all traits, and commas to be another. 




Nevertheless, there are ways in which the differences 
among these rankings are intuitively pleasing. Length of 
essay is of first importance for three traits, and second 
for a fourth. But on one list, essay length does not occur 
at all, and this one is the list for mechanics . On the 
other hand, for mechanics we find the only inclusion of 
spelling errors. The evaluation of mechanics is clearly a 
rather negative thing, in which mistakes count against the 
student, and it seems better for a student to be short and 
safe, than to be fluent. 

In summary, these tables furnish many interesting com- 
parisons for differential study of the contributions of 
the proxes to the central dimensions of ratings which were 
studied here. While there is considerable overlap of the 
important proxes, there is also some difference in weighting 
which increases the accuracy of prediction. 

The uniqueness of the traits . A constant danger in 
multi-trait ratings is that they may reflect little more 
than some general halo effect, and that the presumed dif- 
ferential traits will really not be meaningful. This danger 
is one reason for having eight judgments for each essay , 
since it was predicted that the halo would be extremely 
large. And the evidence we have already seen, showing 
the relative contribution of the proxes to the prediction 
of the traits, supports this suspicion of a large halo 
effect. 

This halo is demonstrated in Table V-9, which shows 
the intercorrelations among the judged traits of the essays, 
as rated by eight teachers for each essay. From this table 
it is clear that mechanics is the most maverick trait <> hav- 
ing little to do with ideas, organization, or creativity, 
but considerably more to do with style. We find a very 
large halo, or tendency for ratings to agree with each 
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I TABL£ V-9 

I 

I INTLRCORRELATIONS OF THE TRAITS 

I OF JUDGED ESSAYS 

i (N 256) 



I TRAIT 1 ^ 2 ii 2 k 

t 

I- /-V/ /rt on G 





1. 


Ideas 




.86 


.86 


.68 


.89 


.93 


1 

ftl- 


4::« 


Organization 


.86 




.82 


.69 


.82 


.91 


i 

1 


3. 


Style 


.86 


.82 




.83 


.86 


.95 


¥ 

1 


u. 


Mechanics 


.68 


.69 


.83 




.65 


.85 


i 

1 - 

t* 


5. 


Creativity 


.89 


.82 


.86 


.65 




.92 


1 


6. 


All traits 


.93 


.91 


.95 


.85 


.92 





I 

i' 




other. It may be noticed that, because of the interdepend- 
ence of these ratings, and their mode of assignment by the 
judges, the intercorrelations here are in some cases actu- 
ally larger than the true reliability of the group ratings. 
Some reflection will show how this could be: once a general 
level of rating is assigned, reliable or not, it will carry 
all the traits along together with it. The variable "all 
traits" on Table V-9 refers to the average of the five 
traits for an essay, counted as equally important. Natur- 
ally, the large correlations shown between "all traits" and 
the individual traits are inflated by a part— whole rela- 
tionship. 

To test the uniqueness of the traits, James J. 

Roberge and others performed the analysis of variance 
shown in Table V-10. There is of course a huge variance 
between essays, and we also find a large variance between 
traits (explained by the mean differences we saw in Table 
V-2) . What is important in this Table V-10 is the signi- 
ficant trait-by-essay interaction, which demonstrates that 
there is a reasonably reliable profile displayed, and that 
indeed there is some "validity" in the different ratings. 

It would be possible, of course, to extract the halo, 
and to work with the residual, and unique, trait variance 
for various prediction purposes. We chose not to do this 
for two reasons: In the first place, we would need con 

siderably more raters for each essay, since the residual 
trait variance, after the halo was subtracted, would be 
far less reliable than the original rating, and would make 
a much less secure goal to simulate. In the second place, 
and more importantly, we were interested in simulating the 
real ratings actually given for a certain trait by real 
human judges with appropriate expertise. And when this is 
a primary goal, then the halo behavior is an appropriate 
part of the simulation target, whether "pure" or not. 



TABLE V - 10 

TRAIT BT ESSAY INTERACTION 



Source 


ss 




MS 


F 


Between judgments 


8,230.305 


2,047 






Between essays 


3,791 .293 


255 




6.002 


Error between 


4,439.012 


1,792 


2.477 




Within judgments 


3,564.414 


8,192 






Between traits 


84.212 


4 


21 .053 


56.412 


Trait x essay 


805.089 


1,020 


.789 


2.115 


Error within 


2,675.113 


7,168 


.373 




Total 


11,794.719 


10,239 







*This Ubl« is based upon essay evaluation of 
Which each of 256 essays was judged by eight different judges 

during eigtit different periods. 
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Judge viewpoints . One effort, within this project to 
improve the predictability of judgments was undertaken by 
Herbert Garber and Robert Shostak (1967) , and reported at 
the Annual Meeting of the American Psychological Associa 
tion. Their work aimed at raising the multiple R, but by 
purifying the trin, rather than by optimizing the proxes. 

Much of this section is quoted or paraphrased from their 
presentation . 

It is well known that one source of error variance 
in a correlation study is the unreliability of the criterion. 
Indeed, working under the assumption of a multivariate 
normal population, it seems reasonable that in any random 
sample, errors ought to be distributed with equal frequency 
in all variables, including the criterion (Hays, 1963, 
p. 573) . 

The inspiration for this particular approach came, 
in part, from work reported in Educational an d Psychologi- 
cal Measurement , volumes 26 and 27, by Naylor and Wherry, 
and also from an article by Jackson and Messick (1961) . 

The first two investigators used a factor-analytic approach 
to do what they call "capture rater policy". They used 
as subjects Air Force supervisory personnel. The latter 
pair described a similar technique for use in studying 
social perception of personal status. One procedure which 
is related to the one of Garber and Shostak is also re- 
ported by Christal (1963) . 

The departure in the present section was first simply 
to find, from among the 32 reader-graders in Project Essay 
Grade, those clusters of readers who tended to agree with 
one another no matter what their policy. In this case, 
their revealed agreement would emerge from factor-analyzing 
judges, not essay grades. The next step after identifying 
a "clear" cluster of judges was to use their individual un- 
pooled grades as the criterion in a new multiple correla- 
tion compution to see if the multiple coefficient would 
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rise. If it would, then the demonstration had worked as 
hoped. A set of judges had been revealed by the analysis 
who were to a larger extent "predictable" from a fairly 
mechanical, computer-executed count of what we have 
labelled "proxes" . 

What Garber and Shostak wanted was to have a factor 
analysis done on raters, and since as few as five essays, 
in one case, had been read in common by a pair of raters, 
it seemed impractical to proceed to the computation of an 
intercorrelation matrix from a data matrix 3/4 ’s empty. 
However, Dieter Paulus arranged to have the incomplete 256 
by 32 score matrix processed at Cornell by Larry Wightman. 
The Cornell program inserted appropriate correlation values 
based on the varying numbers of observations available for 
each essay. On the average, 16 were present. Thus pre- 
pared, the Cornell computer next processed the judge score 
intercorrelations and computed factor matrxces both by the 
components and factor analysis procedures. The eleven- 
factor matrix from the latter computation produced about 
three or four fairly clear clusters comprised of about as 
many individual judges. By "clear" is meant a positive 
loading of over . 65 on a single factor and no other posi- 
tive loadings greater than .43. Negative loadings were 
ignored . 

One cluster comprised of four judges which met the 
above criteria was selected by inspection. For each of 
these judges, 64 essays and the rating he gave to each were 
gathered and then served as input to the next step in the 
project. A modification of the IBM Scientific Subroutine 
Program (SSP) for System 360 on multiple regression was 
used at the University of Connecticut to compute an over- 
all multiple R from the four readers and their essays 
prox scores. The result was a coefficient of .65. At 
first blush this looks as if it were a disappointing out- 
come until one is reminded that the multiple prediction 
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is based in this instance on an unpooled, unweighted 
criterion so that any disagreement among the four readers 
and any dissimilarity in the proxes of the four more or 
less unique essay sets they read both contribute to a 
lowered overall reliability. For one must remember that 
among these four highly correlated judges who were finally 
picked to be a test case of a refined criterion, most pairs 
had read no more than 1/4 of their essays in common. There- 
fore, to get some reasonable basis of comparison, four other 
judges were selected by a random procedure and their data 
were treated in exactly the same way. This time, a multi- 
ple coefficient of only .545 resulted. Comparing the re- 
spective coefficients of multiple determination, .420 and 
.297, we see that a bit over 12% more of the total variance 
in the criterion scores has been accounted for when using 
the selected judges. Putting it in terms of forecasting 
improvement over the "random four" judges we realize that 
using the technique here described we have a 40% improve- 
ment over "chance". 

Let us restate what was done by Garber and Shostak. 
There was a fixed sample of high school essays. A multiple 
correlation coefficient of over .71 was computed when an 
averaged rating obtained from eight randomly-drawn readers 
from a 32-reader pool was utilized as the criterion; and 30 
approximations to writing ability such as length of essay , 
number of commas, use of uncommon words, and standard devia- 
tion of word length served as the predictors. Next, a 
factor analysis based on the unequal numbers of observations 
for the 496 pairs of judges in the 32”judge pool was calcu- 
lated. From this analysis one of several sets of variables 
(really judges)/ which were more or less clearly identifiable 
by the simple structure criterion, were selected as new 
criteria for a multiple correlation computation. However, 
this time the prediction would be much more stringent. 

There could be no approximation to a "true" grade for each 



essay in the usual test-theory sense of a mean score from 
repeated sampling. Instead, the error term would contain 
increased variance from two sources and to an unknown de- 
gree. These error components were, on one hand, from the 
relative lack of overlap in the actual essays read mutually 
by four judges, as contrasted with the higher number of 
"same" essays that eight judges from a 256-essay "popula- 
tion" had read. The other error source was the lack of 
the beneficial effects due to cancelling out random errors 
which occurred when eight ratings were averaged for each 



essay. 

To get an estimation as to what had been gained from 
this method of judge selection, a comparison was made with 
a random selection of four other judges from the remaining 
28. Nearly 40% more predictability was found to be the 
estimated gain. A sam.pling distribution of multiple R's 
could have been gathered, and thus a crude sort of signifi- 
cance approximation calculated, for the R obtained on the 
selected cluster and on the other untried clusters. How- 
ever, one would have serious misgivings about any genera- 
lizability of such findings to other samples from the same 
population of essay writers and graders. 

What are some implications from this study? In the 
\ 70 irds of GsTboir and Shostak: 



First, it has been shown empirically that, by 
this technique, one specific small 
made toward the goal of increasing the multiple 
R in essay grading by computer through selecting 
of criteria on the basis of clusters of consistent 
viewpoint among a random sample of readers. 



Second, by some technique like stepwise regres 
Sion using such identified clusters ' 

knowledge may emerge about the essay evaluation 

process itself. 



And, third, as Davis (1965) suggested in his 
critiaue of Project Essay Grade, we may by 
the route marked out here avoid wiring stu- 
dents to an "easy mass standard" writing style 
since, instead of exposing merely a simplicity 
in human writina behavior, we can begin to 
uncover some sources of dissonance and raucous 
Sungs among the lions who rule the teaching 
of effective composition. 



The possibility outlined here has not yet been 
capitalized upon for the production of higher multiple-R 
coefficients, but it may be regarded as a tool for future 
investigation. It may in the future be extended to trait 
analysis as well as an overall dimension, if it proves 
feasible in later study. 

Trait prediction bv machines . For our present stage 
of development of a new discipline, perhaps the best 
comparisons may still be those of the human expert com- 
pared with the machine. As we have seen, eight expert 
judges, randomly selected from a qualified panel of 32 
such judges, read every essay and evaluated it for ide^, 
organization , style , mechanics, and creativi ^. We were 
particularly interested in including the last named, be- 
cause of the common objections encountered to this sort 
of work by some teachers in the humanities. 

From the beginning, humanists have often miscalcu 
lated the difficulties in essay analysis, and imagined 
that specific criticisms of punctuation and usage might 
be easy to program for the computer, but that global 
measures such as overall quality, style, or creativity 
would be virtually impossible. In one sense, quite the 
reverse is true: We have had prompt success in actuari- 

ally simulating the ratings of these subjective traits, 
as all of our data reported so far would suggest. Yet a 
really sound decision about the correctness of usage of 
a comma, or the agreement of subject and verb, is a pro 
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lem which presumes a great amount of analysis, and some of 
the necessary background routines have not yet been pro 
grammed for any project anywhere. We shall discuss these 

problems later . 

Surely, from this humanistic point of view, the most 
challenging problem of all would be to measure creativity, 
since by such reasoning a creative work, or original work, 
is by definition unlike the others, and is unique; and 
therefore it requires a recognition procedure which could 
not be programmed in advance. 

Obviously, the first step is to remove the problem 
from the humanistic viewpoint and put it within a view- 
point susceptible to behavioral analysis. In order to do 
this, we must ask: How is creativity recognized? How do 

we know when we have achieved it? The only possible answer 
seems to be that a work is creative when people s^ it is 
creative; there is no evaluative procedure above human 
judgment for deciding whether something is imaginative, 

or original. 

But once again, we must appeal to behavioral science. 
If we use one judge (and the humanist, when pressed, will 
often designate himself as sole arbiter) , then we have a 
very uncertain criterion. We do not know to what extent 
the evaluations made by this judge will correspond with 
the "true" creativity in the work. Therefore, we must 
ask other judges to assess the work independently of each 
other and of the first one, and we must regard their judg- 
ments, in absence of evidence to the contrary, as equally 
valid, if they are equally "authorities" in such matters 
(however qualifications might be established) . We still 
do not know how well these judgments correspond with the 
"true" creativity in the work, but at least we can ascer- 
tain how well they correspond with each other. 



We must, in the end, assume that the population of 
all such expert ratings would indeed represent our best 
estimate of any such "true" creativity. Not to admit this 
would leave us in hopeless solipsism. And when we do admit 
such a criterion, and when such ratings are made of large 
numbers of essays, each of which may or may not possess 
"creativity," we are led to additional discoveries about 
the trait. And these discoveries are quite contrary to 
the usual humanist way of thinking. 

For the distribution of creativity turns out to be 
approximately normal, and approximately continuous. And 
it has (as we note in Table V-2) a standard deviation of 
.641 rating points, which is just in the middle of the 
five traits. That is, it does not appear to be a purely 
"qualitative" trait, which may at once be recognized for 
its prescenceor absence. Furthermore, to emphasize the 
apparent continuous quality of the trait, the reliability 
of human judgment of creativity was the lowest for any of 
the five traits, as will be seen presently. 

In short, then, there is every reason to regard 
"creativity" as a criterion rating like any other. And 
to regard in the same way originality, im.agination , and 
other near synonyms used or implied by the instructions 
to the raters, shown in Figure V-1. Of course, the dis- 
tribution is in part a result of the instructions re- 
garding such distributions, but there was no apparent 
tendency by the teachers to force it into a yes-no pattern. 

The data for all five traits, then, are shown in 
Table V-11, which may represent the most complete state- 
ment yet about the comparative success of the basic proxes 
so far presented. Column A of course lists the traits by 
title, column B shows us the reliability of the pooled 
sum of eight independent judges, calculated for each trait. 
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TABLE V-n 



Computier Slmul&tion of Huxo&n Judgnicnta 
For Five Essay Traits 
(30 predictors, 256 cases) 



A. 

Essay 

Traits 


B. 

Hum.-Gp. 

Reliab. 


C. 

Mult. 

R 


D. 

Shrunk. 
Mult. R 


E. 

Corr. 

(Atten.) 


I. Ideas or Content 


.75 


.72 


.68 


.78 


II. Organization 


.75 


.62 


.55 


.64 


III. Style 


.79 


.73 


.69 


.77 


IV. Mechanics 


.85 


.69 


*64 


.69 


V. Creativity 


.72 


.71 


.66 


.78 



NOTE: 

*Col. B represents the reliability of the human judgaents of 
each trait, based upon the sum of eight independent ratings, 

August 1966. . j. A 

Col* C represents the multiple-regression coefficients found 

in predicting the pooled human ratings with 30 independent proxes 
found in the essays by the computer program of PEG-IA* 

Col. D presents these same coefficients, shrunken to elimi- 
n&te capitalization on chance frooi the number or predictor vari— 

ables (cf. McNemar, 1962, p. 164.) 

Col. E presents these coefficients, both shrunken and 
corrected for the unreliability of the human groups (cf. McNemar, 

1962, p. 153.) 



The results here are not very surprising: mechanics shows 

the best agreement, and creativity the least, and this is 
in accordance both with intuition and other work on ratings . 
That is, English teachers are readier to agree on whether 
a word is misspelled, or an improper verb form used, than 
they are on whether a student's writing is original or 
shows imagination. We remember that spelling errors, in- 
adequate as our list of misspellings is, nevertheless 
correlated -.30 with mechanics , while length of essay be- 
came a major contributor to creativity . It is clear, in 
any case, that human judges have a more difficult time 
with creativity than with other traits. 

But what of the computer? Column C shows the raw 
multiple-R coefficients, predicting these criteria, unre- 
liable as they are, from the prox measurements. Here we 
see that mechanic s enjoys no advantage; to the contrary, 
it is more poorly evaluated by the computer than creati - 
vity is, and organization more poorly evaluated still. 

This relative standing, as we have seen, is contrary to 
the intuition of the humanist about what is easy, and 
what is hard, in the computer evaluation of prose. 

Column D has made the reduction in MULTR which, as 
we have formerly discussed, is necessary to compensate 
for the capitalization on random error inevitable in 
multiple regression. These shrunken coefficients, then, 
have been found through statistical manipulation, or 
through lookup in Table IV-ll(B), rather than through 
empirical cross-validation. Again, mechanics is not 
highest, and creativity not the lowest, of the shrunken 
correlations . 

Column E exhibits a transformation of Coliamn D, 
pumping up the correlation to compensate for the unre- 
liability of the criterion scores. Column E, therefore, 
reflects the true population correlation which might be 



expected from the 30 proxes under the case of perfectly 
reliable judge ratings, and after eliminating the capitali- 
zation on random variation in the proxes. Thus the corre- 
lations in Column E are the best evidence to date about 
what success we theoretically would have in predicting the 
important qualitative dimensions of ideas, organization, 
style, mechanics, and creativity, using only computer- 
measured variables in the prose. 

For all five traits, we have seen an ability to pre- 
dict the "true" ratings with a rather surprising degree 
of accuracy. 

Summary . This chapter has broken down the evaluation 
of essays into important dimensions, and has investigated 
strategies in predicting human judgments of these dimen- 
sions. New ratings were generated for 256 essays, with 
eight expert teachers, drawn from a sample of 32, inde- 
pendently grading each essay, on five traits commonly 
accepted as important. The judge intercorrelations, and 
trait differences, were shown. Then the chapter indicated 
how the proxes differentially contributed to the traits, 
so that spelling errors contributed to the evaluation of 
mechanics far more than they did to that of the other 
traits. Nevertheless, as one would expect from the halo 
effect demonstrated here by correlation and by analysis 
of variance, there was a great similarity in the lists of 
high contributors to the various traits. Some investiga- 
tion was made of refining the criterion by gathering to- 
gether similar judge viewpoints, and this possibility was 
recommended for further exploration. Finally, the overall 
ability of the system to predict the various traits was 
tested, and it was found that, contrary to what some might 
argue, such presumably lofty and subjective traits as 
creativity could be as effectively evaluated, using the 
present strategies, as well as the presumably more objec- 
tive trait of mechanics . All in all, there did not appear 



to be any general area of essay evaluation which seemed, 
on any a priori grounds, beyond the possibility of automatic 
evaluation and analysis . 



CHAPTER VI 



PROBLEMS OF STATISTICAL IMPROVEMENT 
IN PREDICTION 

The prior chapters have reported on work done over 
more than two years in analyzing essays mechanically. As 
has been seen, the success to date has been striking, al- 
though in a number of ways the reported strategies are 
surely less than optimum. One of the possibilities for 
improvement would appear to be in a more sophisticated 
strategy of statistical prediction. This chapter tells of 
explorations made into seeking some system other than the 
basic linear one of most multiple regression programs. 

Much of this chapter is based upon a report made by the 
authors (Paulus and Page, 1967) at an Annual Meeting of 
the American Psychological Association, in Washington, D.C. 

The problem of linearity . A standard multiple regres- 
sion program calculates an equation of the type shown as 
equation (1) in Figure VI-1. In this equation b^ to b 3 Q 
represent computer calculated weights for each of the proxes , 
X, to x-,„. These weights are calculated in such a way so 
as to maximize the correlation between "V (the predicted 
score) and Y (the actual score or rating) . We found this 
correlation to be over .65 (that is, on cross validation 
and after correction for attenuation) . As we have seen 
from prior chapters, the method works. However, it does not 
work as well as it could, or perhaps should. 

Of the many ways one might attempt to improve statis- 
tically upon the method, this chapter will report two, 
since both are applicable to a wide variety of multivariate 
predictive problems, not only to the grading of essays. 
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FIGURE VI -1 
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Y = h^x^+ b 2 X 2 + b 3 X 3 + . . . 


*^30^30 = 


( 2 ) 


Y = (b 2 X 2 + + c 




(3) 


Y = t>^x^ + b 2 X^X 2 + c 




( 4 ) 


^ 30 30 30 

Y = I,b.x. +.Z, . 2 , , 
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(5) 


Y = b^(x^-x^) + b 2 (x 2 -X 2 ) 


+ b^ (x,^-x^) (X 2 “X 2 ) + c 


(6) 


Y = b^x^ - + b 2 X 2 - 


^ 2^2 ^3^1^2 ” ^ 3 ^ 1 ^ 2 ” 




b 3 X 2 ?i + c 





( 7 ) Y - + ^>3X^X2 - b2X^X2 - b2X2X^ + c 



— 1 

Y = (b^-b2X2)x^ + (b2”b2X^)x2 + b^x^X2 + c 



(8) 



since both employ only existing data, the problems asso- 
ciated with the collection of further data are, therefore, 
avoided. It is our belief that some of the same problems 
will often haunt the workers with verbal data of the kind 
in this project, and therefore this discussion has rele- 
vance for other workers concerned with natural language 
strategies . 

The first of the two approaches deals with the use of 
simple two-way interaction terms in an attempt to increase 
predictability. The second approach deals with the examin- 
ation of the relationships between the various proxes and 
the criterion (the pooled ratings), with the thought of 
applying transformations to the proxes in an effort to in- 
crease the correlations between the proxes and the criterion. 
One of these uses interactions , then, and the second uses 
transformations . 

Interactions . One way in which one might conceptualize 
an interaction tern in a multiple regression equation is to 
think of variable weightings of predictor variables . We 
want the weights received by a given variable not to be a 
function of that variable's correlation with the criterion 
and the other independent variables alone, but also to be 
a function of the subject's score on some other variable. 
Equation (2) of Figure VI-1 will make this clear. Note 
that the weight received by x^^ in this simple equation is 
the quantity {^ 2^2 ^ ^1^ ' function of the variable X 2 

plus the constant b^. Carrying out the indicated multipli- 
cation we obtain in equation (3) the simple cross-product 
of x^ and X 2 which, along with the appropriate weight, 
represents the interaction of x^ and X 2 on the criterion. 
Generalizing from this simple case, we can see that any 
number of cross-products (i.e., interactions) may be in- 
cluded in a multiple regression equation along with linear 
terms. Given our 30 proxes, then, we can look at 435 two- 
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way interaction terms in addition to the 30 linear terms. 
This eauation is of the form of equation (4) of Figure VI-l. 

Before this equation can be calculated, however, two 
preliminary problems must be considered. The first problem 
is illustrated by equations (5) through (8) of that figure. 
Assume that we want to predict some criterion score Y from 
three independent variables: ^ 2 ' the interaction 

of and ^ 1 ^ 2 ' Equation number (5) gives the three 

predictor equation with independent variables expressed in 
deviation form. In equation number (6) the indicated multi- 
plications have been carried out. Underlined terms are all 
constants and may, therefore, be absorbed in the constant 
term "C" , as shown in equation number (7) . After underlined 
terms have been combined in equation number (8) we find that 
the weight assigned to x^^ is the quantity (b^^ - and 

the weight assigned to X 2 is (b 2 - . We can see, 

therefore, that the weights assigned to the linear terms in 
equation number (8) are distorted by the values of and 

^ Q 2 » X 2 . There are only two conditions under which this 

distortion is not present. First, when b^ equals zero, and 
second, when the means of the linear terms are equal to 
zero. Since the interaction term is included precisely 
because the investigator does not believe that b^ is equal 
to zero, only one alternative remains: to set and X 2 

equal to zero. Further, and this appears to be contrary to 
all intuition, the correlation between the interaction terms 
is affected in a manner similar to the distortion of the 
weights assigned to the linear terms, unless the means of 
the linear terms are equal to zero. These distortions 
generally tend to inflate the correlations between the in- 
teraction terms and the criterion. Interaction terms, 
therefore, generally appear to be more valid than they 
really would be if the means of the linear terms had been 
adjusted to zero. If higher order interactions are con- 




sidered, then the means of the lov;er order interaction terms 
must be adjusted to zero before the higher order interaction 
terms are calculated. Interestingly, the multiple correla- 
tion coefficient is not affected by these distortions; the 
crucial effect these distortions have is in the research- 
er's interpretation of his results. Hence, as the first 
step in working with interaction terms in the predictive 
context outline above, all linear variables must be stan- 
dardized to a mean of zero. Another way of phrasing this 
dictum, might well be; "No multiplication without standard- 
ization" . 

The second general problem which must be resolved/ 
orior to the calculation of equation (4) mentioned before, 
is the selection of useful interaction terms. "Useful" is 
used here in the sense of an interaction term’s ability to 
increase the multiple correlation. As mentioned before, 
we have, given 30 proxes, 435 possible interaction terms 
which could be included in the equation. It seems clear 
that not all of these interaction terms can be efficiently 
used in our predictive context. The reason for this is 
that, in cases where the predictors vastly outnumber the 
number of subjects, the loss in validity on cross-valida- 
tion of the linear composite of terms becomes very, very 
great. A method of selecting useful predictors from all of 
the possible predictors needed, therefore, to be developed. 

A standard method usually employed in situations of this 
type is step-wise multiple regression. However, all step- 
wise multiple regression computer programs which we were 
able to find and to examine, required that at least one 
variable-by-variable matrix be stored in core memory of 
the computer. Given the amount of data available here, 
this would require a minimum of 200,000 core locations, too 
many for the computers currently available. As an alterna- 
tive to the step-wise multiple regression procedure, the 
following method was employed. 



A simple correlation coefficient is calculated between 
each of the independent variables and the criterion. The 
absolute values of these correlations are rank-ordered. 

The largest correlation is selected and the criterion is 
predicted from that variable which yielded the correlation. 
This is done for each subject. A new variable is then 
created which is the difference between the predicted 
criterion score and the observed criterion score. This new 
variable has the property of being uncorrelated with the 
independent variable which wai just used. In other words, 
we now have a variable which does not correlate with the 
variable that was selected, nor with those portions of the 
other independent variables which correlate with the vari 
able that was selected. In effect, a series of partial 
correlations are calculated: the first one being a zero 

order correlation, the next one a first order correlation, 
etc. At each step, the original criterion is replaced by 
the residual, the new variable, and the process is repeated 
until the residual correlates no longer with any of the 
remaining independent variables at some reasonable level, 
say, ,05. This method provides for a rank-ordering of 
predictors. But the method has two weaknesses. First, the 
method is not as powerful as its converse. (Ease in pro- 
gramming, however, made the present method more desirable 
at this time.) Second, the method does not really allow 
for the selection of suppressor variables. This was un- 
fortunate, and the investigators still seek a solution. 

An additional problem inherent in all predictor selec- 
tion techniques is that of cross validation. The validity 
of a multiple regression equation will, of course, almost 
always be highest for the sample in which the equation was 
constructed, and lower in other samples or in the population 
As we have pointed out, formulas for estimating the cross- 
validities of sample multiple regression equations, such as 
the Wherry formula or the Lord-Nicholson formula, do not 



aoply with complete rigor to situations where predictors 
were selectively chosen from a large number of predictors . 
Therefore, cross-validation estimates had best be esta- 
blished empirically. Thus, in this research, the previously 
described procedures were applied to a random sample of two 
thirds of our essays; the remaining third was used as a 
cross-validation sample. 

Table VI-1 presents the data obtained when this cross- 
validation was applied. Note that only nine interactions 
and linear terms were selected before the correlation be- 
tween the residual criterion and any of the predictors 
failed to exceed .05. These nine variables were entered 
into a standard multiple regression equation in order to 
obtain weightings and a measure of their combined predic- 
tive power. The obtained results are reported in Table 
VI-2. As expected, the multiple correlation is somewhat 
higher than the one obtained when the 30 linear terms were 
used. This increase, however, can't be evaluated until 
the equation is cross-validated and the amount of shrinkage 
has been discovered. Therefore, the obtained equation was 
applied to the remaining third of the sample, and the pre- 
dicted scores were correlated with the observed scores. 

The correlation coefficient which was obtained was .63. 

You will note that this coefficient is approximately the 
same as the shrunken coefficient which was obtained by 
using the 30 linear terms - the proxes alone. This seems 
to indicate that we can predict the grade an essay receives 
as well from nine variables as we could from the original 
30. Making use of interaction terms, therefore, does not 
allow us to predict any better, but rather to predict just 
as well using far fewer variables . The lack of increase in 
predictability is puzzling and may perhaps be attributed to 
the relative unstability of the criterion. If the criterion 
had been more reliable, this method would surely have yielded 
better results, for the reasons already explained in an 
earlier chapter. 



TABLb VI-1 



HAN'^v-ORDERING OF PREDICTOR VaRIABU^S 



Step 


Variable 


Correlation with 
Residual Criterion 


1 


Standard Deviation of Word Length 


.52 


2 


Nunber of Coanas 


.23 


3 


Length of Essay in Words 


-.15 


4 


Interaction of if of Periods and 
a of Subject-Verb Openings 


-.15 


5 


Interaction of if of Periods and 
# of Declar. Sentences Type "A" 


.12 


6 


if of Dashes 


-.11 


7 


a of Words on Dale List 


-.10 


8 


Interaction of w of Periods and 
ff of Declar. Sentences Type 


.07 



9 



# of Connective Words 



.05 
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Transformations . The first step in dealing with the 
relationships between each of the proxes and the criterion 
was to develop a short computer routine which would graph 
the relationship between each of the proxes and the cri- 
terion. Sample graphs for five variables are shown in 
Figures VI-2 to VI-6. It was hoped that by examining 
such graphs insights could be obtained which would aid one 
in selecting transformations to be applied to the proxes 
so as to yield higher correlations with the criterion. 

On these graphs, the x or horizontal axis represents 
the independent variable, the prox. The y or vertical axis 
represents the ratings which an essay received. A rating 
of 1 is the lowest rating an essay could receive; a rating 

of 5 was the highest. 

Each of the graphs was carefully examined in an effort 
to determine if any reasonable curve might explain the 
data better than a straight line. It appeared that for 
several of the graphs this might well be the case. Examine, 
for example, the graph for variable nvimber 8 (Figure VI-2) . 
The curve indicated by the dotted line may well fit the 
data better than the straight line. Both have been indi- 
cated on that graph. In order to sequentially apply trans- 
formations to the proxes, the following techniques were 
employed. A FORTRAN II program was written for the IBM 
1620 computer (chosen locally for its auxiliary equipment 
and accessability) which allows a researcher to apply real 
time transformations to the data. The program calculates 
means and standard deviations of both prox and criterion, 
and the correlation coefficient between the two. Next, the 
relationship between the two variables is plotted via a IBM 
1627 plotter. These plots are similar to the ones in 
Figures VI-2 to VI-6, except that the points are connected 
and that a complete plotting grid is supplied. After exam- 
ining the plot, the researcher can apply to the data one of 
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14 transformations (or any combination of these transforma- 
tions) . The entire process is then repeated until the re- 
searcher decides to stop. The current values of the 
variables are then punched out on cards. This process is 
illustrated by Figure VI-7. 

As the number of cases increases, this process becomes 
painfully slow on the 1620. (Compilation alone took about 
20 minutes, and for 200 cases some transformations required 
as much as 15 minutes.) As a result, Paulus converted this 
program to run on our time-sharing teletype console, which 
is connected with an IBM 7094 at the Massachusetts Insti- 
tute of Technology. Again, it appeared that the relative 
instability of the criterion limited the usefulness of this 
approach . 

Discussion . To date, then, the Project has examined 
some methodological problems dealing with nonlinearity in 
predicting grades on essays. So far, however, we have 
not been able to substantially increase predictability by 
these methods, beyond that obtained under a naive linear 
assumption. It is our feeling that this may be due to the 
instability of the ratings of the essays, and, of course, 
the lack of more sophisticated proxes. Given the loose 
ratings used so far, it seems relatively unimportant what 
combinations of proxes are used, what transformations are 
applied to .some of them, or what interactions are considered. 
The multiple correlation, after cross validation, appears 
to have stabilized at about .65. 

There are at least two general ways in which such 
work may proceed in the future. The first is to recognize 
that there are differences among raters, and to attempt to 
empirically establish groups of raters, then to attempt to 
describe the characteristics of these groups . Some steps 
in this direction have been reported in the prior chapter. 
Then multiple regression equations, employing the previously 
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FIGURE VI-7 

SAMPLE OUTPUT FROM TRANSFORMATION PROGRAM 



CURVE FITTING PRO(»AM 



FROBI^ ITUMBER 076 



THERE ARE 50 OBSERVATIONS 

PLEASE CHECK SENSE SWITCH SETTINGS AND PRESS START WHEN READI 
MEANS AND STANDARD DEVIATIONS 



MEAN OF X « 1 .9800 
MEAN OF I * 2.8890 
S.D. OF X * *9637 
S.D. OFT« -5703 

THE CORRELATION COEFFICIENT BEIWEEN X AND T IS 
IF STOP WRITE 9, ELSE 5 



- .0720 



1 

I AM READI TO ACCEPT ROUTINE NUMBER AND PARAMETER 

n 2 

means AND STANDARD DEVIATIONS 
MEAN OF X * 6.A369 

mean OF T * 2.889Q 

S.D. OF X * 2.0449 

S.D. OF I - -5703 

THE CORRELATION COEFFICIENT BETWEEN X AND I IS 
IF STOP WRITE 9 , ELSE 5 



I AM READI TO ACCEPT ROUTINE NUMBER AND PARAMETER 

12 i 

mean OF X * 52.6452 

mean of I * 2.8890 

S.D. OF X « 10.8134 

S.D. OF I - *5703 
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FIGURE VI-7 (Cont.) 



THE CORREIATION COEFFICIENT BET^VEEN X AND T IS 
IF STOP WRITE 9, ELSE 5 

i 

I AM READY TO ACCEPT ROUTINE NUMBER AND PARAriJiTER* 

6 

MEANS AND STANDARD DEVIATIONS 



MEAN OF X * 6.4369 
MEAN OF I *■ 2.8890 
S.D. OF X * 2.0449 
S.D. OF T « .5703 



THE CORRELATION COEFFICIENT BETWEEN X AND I IS 
IF STOP WRITE 9, ELSE 5 

2 

END OF JOB 



Note: All input ie underlined. 



.1423 



.1621 
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discussed techniques, can be calculated for each hoiuogene” 
ous group of raters • We have recently completed a factor 
analysis of the 32 raters who rated pur essays. However, 
since not all essays were rated by the same raters, we 
find that our data matrix contains more missing data than 
existing data. So we suspect that at least some of the 
factors which we empirically isolated are missing data 
and/or content factors. 

A second general approach deals with differentially 
weighting raters before composite scores are calculated. 

This approach requires some judgment about the relative 
validity of each rater. Since we have no essays which have 
been rated by all of the raters, this poses some problems. 
One approach seems promising. This involves factor analy- 
sing the raters and using their factor scores (or some 
function of them) on the first principal component as 
weights . 

Summary . In general, the investigators feel that 
workers with verbal data should be pleased but not contented 
with the present state of the art, and with the results 
obtained from using linear regression analyses. And they 
should continue linear analysis for the time being. But 
they should take care, whenever in doubt, to cross-validate 
the results. Further statistical optimization will probably 
be eventually profitable, when larger changes have been made 
in other aspects of the work. 



CHAPTER VI T 



PHRASE LOOKUP AND ITS 
APPLICATIONS 

The work described in the chapters up to this point 
has been limited by the computer program which has been 
used, and which has been shown in Appendix A. While this 
program, Ccilled PEG, is modular, mnemonic and flexible, 
it lacks any real convenience in looking up phrases . The 
present chapter describes a phrase look-up algorithm to 
accompany the program for essay analysis, and describes 
some studies done with the algorithm. 

The phrase look-up procedure . The phrase look-up 
algorithm for this project was designed primarily by 
Donald R. Marcotte, and formed part of his M.A. thesis 
(Marcotte, 1966) . Much of the present description is from 
his thesis or from the x'elated report by Marcotte, Page, 
and Daigon (1967) . 

In one sense, of course, phrase lookup requires no 
special program. It is easy to insert in a FORTRAN program 
a conditional transfer of the form: 

IF (WORD(I).EQ. X.AND.W0RD(I+1) .EQ.y) GOTO ... 

Here we have tested whether two words in a sequence of text 
words matched two words from some phrase. If the first text 
word in the sequence is not the same as X, then the test 
has failed, and the GO TO will not be executed. And if the 
first word is X, but the second text word is not the same 
as y, then again the test has failed, and the GO TO will 
not be executed. 

Such a test, however, lacks efficiency, and as a list 
of phrases of interest becomes large, would become very 
cumbersome to program, organize, and alter. What is de- 
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sired is a procedure which permits search through a simply 
presented list of phrases, a list which may be regarded in 
the same way as the dictionaries in the main analysis pro- 
gram. And it is this need which the subroutine PHRASE was 
designed to fill. Appendix C has the source program list- 
ing for PHRASE. 

In order to implement PHRASE , a skeleton copy of the 
PEG program was used to assemble the sentences of the 
essay being read, in the way already described. Also, the 
main" program was used to read the array of first words of 
the phrases, and to read in the full phrase matrix. 

One sentence is obtained from the essay being cor- 
rected. A phrase-within-quotation-marks (PWQM) counter, 
a PWQM indicator, and an adjusted word counter are set to 
zero. The PWQM counter is incremented every time a phrase 
is enclosed within quotation marks. The PWQM indicator 
provides a symbol, either 0 or 1, for punched-card output. 
The adjusted word counter eliminates unnecessary processing 
of words that have already been identified as part of a 
phrase. Since phrases of only two or more words are pro- 
cessed, the index indicating the number of words in the 
sentence is reduced by two, because the last word and end 
punctuation need not be processed. 

DO LOOPS are set up which call the computer to cycle 
automatically until certain criteria have been met. The 
initial DO LOOP provides for the search of a sentence for 
a word that belongs to an array of first words of phrases. 
Prior to doing this, a test is conducted to determine if 
the index indicating the ordinal position of the word in 
the sentence is less than the value of the adjust word 
counter. If this index is less than the adjusted word 
counter no cycling occurs since the word being analyzed ’ 
has already been processed, or it is the first word of the 
sentence. If the index is equal to or greater than the 
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adjusted word counter, the word is processed. A provision 
is made to eliminate the processing of both parts of a 
natural-"language word. This is necessary since a computer 
word (on the IBM 7040) consists of only six letters while 
many words in the English language contain more. There- 
fore, all natural-language words are represented by two 
computer words. This means that it is possible to identi- 
fy the first part of an English word and also attempt to 
identify the second part of the same word. This possi- 
bility is eliminated by an appropriate test. The test is 
made by dividing the index for indicating the ordinal posi- 
tion of the natural-language word in the sentence. If the 
natural-language word has been processed previously, then 
no cycling occurs, and the next computer word is examined. 

Because the computer cannot differentiate between 
natural-language words and punctuation marks, a test is 
conducted to determine whether the unit being analyzed is 
a punctuation mark. If this is so, no cycling occurs; 
but if the unit is not a punctuation mark, cycling does 

occur. 

As was noted earlier, each natural-language word needs 
two computer words. Therefore the second DO LOOP requires 
two comparisons for each word provided to it. These two 
comparisons result in the identification of the particular 
phrase for which processing occurs. 

After identifying the specific phrase, the adjusted 
word counter is incremented by two because two computer 
words have been processed. The index for the print-out 
array, RC, is set equal to two, and the two identified 
computer words are placed in the array, RC. The value of 
the row counter replaces the value of another row counter 
needed to process the phrase. 



o 



- 128 - 



The third DO LOOP provides for the comparison of each 
natural-language word following the identified word with 
each natural-language word in the specific phrase. During 
each cycle a test is made to determine if the computer word 
of the sentence is the same as the computer word of the 
phrase. If it is, then the RC array index is incremented 
by one and the computer word is placed in the array, RC. 

If no comparison is made, then a test is made to determine 
if the symbol identifying the end of the phrase is present. 
If so, then indices for the identification of the presence 
of quotation marks are established. This is done in two 
steps: (1) by replacing the first index with the ordinal 

value of the natural-language word preceding the first 
word of the phrase in the sentence and (2) by replacing 
the second index with the ordinal position of the natural- 
language word succeeding the last word of the phrase in 
the sentence. The second index may have one of two values. 
This permits the identification of phrases that are not 
only enclosed within quotation marks but also have punctua- 
tion marks within the quotation marks. The first of the 
above alternatives is examined, and if the phrase is not 
enclosed solely within quotation marks then the second 
alternative is employed. If neither alternative is correct, 
then the phrase counter is incremented by one, and the 
computer word following the last natural-language word in 
the phrase in the sentence is cycled. 

If in the test for an ending symbol, no comparison is 
made, then the index for the array, RC, is tested to deter- 
mine if less than four natural-language words are in the 
array. The fourth word is not tested because phrases con- 
sisting of four words have no end symbol. If there are 
fewer than four words in the RC array, then the original 
row index counter is incremented and the next phrase is 
processed. This is done because several phrases begin with 



the same word, and it is necessary to examine all phrases 
having initial words in common. Continual incrementing 
of the row index counter occurs until all phrases beginning 
with the identified word in the sentence have been analyzed. 
This also means that one extra word will be analyzed, the 
initial word of the phrase following the phrases that have 
been examined, because the number of phrases beginning with 
the same first word are not constant. That is, it is not 
possible to determine when the series of phrases beginning 
with the same first word end. Therefore the added compari- 
son is made. 

Once the phrases are identified, it is necessary to 
record the information for "output." The output is punched 
on cards as well as printed on paper . There are two sets 
of punched card output: (1) the cards containing the 

identification number of the essay, the identification 
number of the phrase, the symbol indicating whether the 
phrase is enclosed within quotation marks or not, and the 
identified phrase, and (2) the cards containing the identi- 
fication number of the essay, the total number of trite 
expressions used in the essay, and the total number of 
trite expressions enclosed within quotation marks. The 
printed output is an amalgamation of (1) and (2) above. 

The final DO LOOP provides for the replacement of 
each word in the RC array by zero. 

An application to cliches . Beyond constructing the 
described algorithm, the main purpose of Marcotte's study 
was to find how important cliches may be in the computer 
evaluation of student essays. Surely, according to 
English texts, such patterns of writing would be presumed 
to handicap an essay’s evaluation, and might be expected 
to correlate negatively with human judgments . 
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liches. A cliche has been defined by 



Background on c 
Partridge (1962) as "...an outworn commonplace; a phrase, 
or short sentence, that has become so hackneyed that care- 
ful speakers and scrupulous writers shrink from it because 
they feel that its use is an insult to the intelligence of 
their audience or public." The searching of essays for 
cliches is a tedious if not impractical task. Certain 
cliches such as "each and every" and "null and void" seem 
to blend into a sentence so that they are not easily seen 
on the first reading. Second and third readings are often 
necessary to identify the cliche or cliches in the essay. 
The task, therefore, of spotting cliches seems insurmount- 
able when there are several hundred essays to be examined, 
particularly when the essay has to be graded for other 
factors such as creativity, mechanics, style, organization, 
and ideas or content. The time required to make just one 
very detailed reading and commentary, a minimum of fifteen 
minutes (Daigon, 1966), is considerable, but when tv/o or 
three readings are required the time multiplies greatly. 
Because cliches are clearly defined word groups, a computer 
search strategy is very efficient. Cliches can be stored 
in the computer and exact comparisons made. 

LaBrant (1949) has discussed the difficulty of being 
sure when a cliche is hackneyed to the person using it, 
and Fowler (1965) has pointed out that every cliche seems 
fresh and novel at some time to the user. And Gutb (1964) 
has warned against the "overzealous avoidance" of phrases 
which might seem trite, saying the there is a not always 
clearly distinct borderline between the hackneyed and the 

idiomatic" (p. 194) . 

Partridge (1962), however, has approached the problem 
more systematically by providing a rather extensive list 
of cliches in dictionary form. He categorizes each clichd 

into one of four groups: 



1. Idioms that have become cliches. 

2. Other hackneyed phrases. 

Groups (1) and (2) form at least four-fifths 

of the aggregate. 

3 . Stock phrases and familiar quotations from 
foreign languages. 

4. Quotations from English literature. 

Other noteworthy aspects of Partridge's dictionary are 
definitive information and specific examples for each group 
of cliches, and the annotation of some cliches to indicate 
that these are considered particularly hackneyed or objec- 
tionable. Furthermore, Warriner (1951) and Griffith (1957) , 
pp. 263-4) have supplied cliches not on Partridge's lists, 
and still others have been supplied by personal advance of 
Ar’thur Daigon. Three hundred clichds were included in 
this portion of the study, divided into two groups of 150 
each: (1) cliches considered by Partridge to be 'particu- 

larly offensive," and (2) others which were presumably not 
so odious. These lists may be found in Marcotte (1966, 

App. C) , and will not be presented here. 

Of the 256 essays examined, only 58 contained any 
occurrences of the cliche phrases, and there were only 74 
occurrences all together. The number of differed cliches 
used is only 24, and these are listed, together with their 
frequency of occurrence, in Table VII-1. An examination 
of these shows a rather large loading on two phrases: 

"finer things" and "in my opinion". When it is remembered 
that this particular essay was on whether, in a student's 
opinion, the best things in life are really free, it is 
understandable why these should occur so often. And these 
two phrases are seen to be pretty meaningless for any 
general conclusions. Of Partridge's "particularly offen- 
sive’^ phrases only eight were found, for a total frequency 
of only 13. In general, the cliches actually found do not 
seem necessarily very handicapping. 
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TABLE VII-1 



TRITE PHR/iSES FOUND IN HIGH 
SCHCCL ESSAYS 




Cliche 



Frequency 



all in all 

by the same token 

common understanding* • • • 

each and every- 

finer things 

first and for«aost 

helping hand* 

high and dry 

in my opinion* *******••« 

in the long run 

let's face it*** 

matter of fact 

more or less* ',********•' 

really and truly 

reigns supreme. 

root of all evil******* 

step by step 

survival of the fittest 

this day and age 

through and through.*** 
through thick and thin* 

to say the least 

wishful thinking 

work and no play 



3 

2 

1 

1 

14 

1 

3 

1 

19 

7 

1 

1 

3 

1 

1 

2 

1 

2 

5 

1 

1 

1 

1 

1 



This intuitive feeling is borne out by Marcotte s 
statistical comparisons between those essays containing 
cliches and those not containing them. The mean ratings 
of the two groups (one with 58 essays, the other with 198 
essays) were compared using a random— sample t— test, one- 
tailed because of the natural assumption that the non- 
cliche essays would be presumedly superior. No such evi- 
dence was found. To the contrary, for one trait (Ideas) 
the difference between group ratings was even in the wrong 
direction, and happened to account for the largest t-ratio 
found (1.37). .And none of the t-tests approached signifi- 
cance. 

We may infer that these particular lists of cliches, 
which are apparently as authoritative as any , do not aid 
in predicting whether an essay will be judged to have 
superior ideas , organization , style , mechanics , or creativity . 
Often findings of "no significant differences" are depre- 
ciated as inconclusive, or uninteresting to science. Here, 
however, where the data are drawn from a naturalistic 
essay situation and evaluated by realistic judges , such 
null-hypothesis findings seem to have great relevance. The 
avoidance of hackneyed phrases is often a subject of teach- 
ing in courses in composition, and this study casts a con- 
siderable shadow over the importance of the topic, at least 
in the secondary grades here sampled. 

A search for psychological characteristics . Another 
application of the phrase look-up algorithm was in a study 
of what might be called quasi-psychological characteristics 
of prose (Hiller, Page, and Marcotte, 1967). This study 
was a combination of the strategies and methods used in 
this overall project, together with some of the subjective 
list-generation character of the General Inquirer (see 
Stone et al, 1966) . 
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Traits were postulated which Hiller called "opiniona- 
tion," "vagueness," and "specificity-distinctions", and 
for which he subjectively generated some phrase diction- 
aries, for intended use with the PHRASE subroutine al- 
ready described. For the trait of opinionation , phrases 
were listed such as "I feel," "I think," "in my opinion," 
"who can doubt," etc., and the list included such appar- 
ent indicators of certainty as "all," "always," "beyond a 
doubt," etc., since opinionation and such certitude were 
believed to have something in common. All told, 130 items 

were included . 

The other traits were similarly generated from an 
intuitive basis, supported by general admonitions in Strunk 
and White (1965) . "Vagueness" was believed by Hiller to 
be indicated by such qualifiers as "probably," "usually," 

"a matter of opinion," "generally,: etc. This category' 
of vagueness contained 60 items. And " specif icity-distinc 
tions" was believed to be indicated by words implying a 
specific, or concrete, point of view, such as analyze 
"ambiguous," "exception," "distinction," "specifically," 
etc. This list contained 90 words or phrases. 

These phrase lists, then, were looked up in the 256 
essays, and their correlations were studied with the same 
five traits of essay quality. To eliminate the general 
factor of length, the frequency of occurrences of such 
phrases should properly be divided by the total number of 
words of an essay, just as was done with other proxes. 

The correlations of these new proxes with the five trins 
are shown in Table VII-2. All correlations are in the 
predicted direction, and a number of them are highly signi- 
ficant, given the large number of essays represented. At 
first glance, then, the findings of Table VII-2 seem to 
lend some support for a kind of construct validity of the 

three traits postulated. 
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TABLE VII-2 



CCRR5LATI0K CF FIVE I4AJ0h. TRINS 
WITH "OPINIONATION, » "VAGUENIoS," A2^'D “SPECIFICITY" 

(N = 256 ) 



Trins 


"Opinion. •• 


“Va/nie." 


“Specif. 


Ideas 


-.17* 


-. 26 * 


.08 


Organization 


-.20* 


1 

• 

H 


. 15 * 


Style 


-.16* 


-.22* 


.10 


Mechanics 


-.14 


-.U 


.10 


Creativity 


-.14 


-. 32 * 


.04 


Means 


9.1 


15.0 


2.0 


St. Deviat. 


7.7 


• 6.3 


2.0 



^Significant at the .01 level, with a one-tailed test. 

Note: All correlations are based upon prox proportions, but 

the means and s.d.’s are raw frequencies. 



Unfortunately, further analysis leaves the question 
very much in doubt, and the sum of evidence seems somewhat 
more neqative than positive. It can be remembered from 
0 arlier chapters of this report that some of the largest 
correlates obtained with most of the trins were those 
proxes based on vocabulary: Dale common word list, aver- 

age word length, and standard deviation of word length. 
These three are all presumably correlated with an inferred 
"frequency-score" of a student: the more words he uses 

which are infrequent, the more favorable will be these 
various vocabulary measures, and the higher will be his 
probable ratings. 

These traits of "opinionation, " "vagueness," and 
"specificity" were generated by uncontrolled subjective 
procedures, which would of course have no built-in safe 
guards against correlations with these other important 
proxes. Even the examples given here suggest a bias along 
the frequency dimensions: "I," "my," "always," "all," 

"probably," "usually," strike one as fairly commonplace, 
whereas "analyze," "ambiguous," "distinction," etc. are 
drawn from a less frequent set of terms. This supposition 
is borne out by the evidence in Table VII— 3 . 

Here it is evident that the presumed dimensions are 
well enough correlated with prior vocabulary proxes so 
that the new evidence of correlation with essay quality 
does not contribute substantially in any search for con- 
struct validity for the new lists. If the lists happen to 
strike a reader as persuasive, then the measures, indivi- 
dual though they are, can be said to possess some face 
validity. But apparently we still do not have any more 
compelling evidence for their being important measures in 
their own right. This is a problem that is common in con- 
tent analysis work. The problem is shared by the "diction 
aries" used in most of the General Inquirer work, as we 
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TABLE VII-3 
COfiKSUTICKS 0? vocabulary i-iiAbURii) 



V.ITH "CPINIONATION 


"VAGULNIiSS, 
(N = 256) 


" a2vD "SP 


rjCIFICITY" 


Vocabulary 


"Ooinion," 


"Vamie," 


‘•Specif, 


Dale list 


.32 


,16 


-.14 


Aver, V.-d, L 4 ng, 


-.43 


-,06 


.24 


St, Dev, Wd, lmg» 


-,18 


-.19 


.19 



Only proportions are used for the column proxes. 
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have already noted, and the General Inquirer was clearly 
one of the two models for this sub-study. 

More important, from an essay-analysis viewpoint, is 
the fact that the multiple-R for predicting essay quality 
does not seem to be increased by these traits of opinion 
ation," "vagueness," and "specificity." At first they 
seemed to one worker to contribute some new variance, but 
to date no cross-validation has shown significant improve- 
ment in the prediction through use of these hypothesized 
variables. This has some meaning for the major future 
developments in essay analysis, as will be discussed in a 

later chapter. 

Correlative conjunctions . Some other types of routines 
have been developed for sequences of words which may be 
separated by other words. One worker, Alice Trailer, was 
curious about the use of correlative conjunctions, such as 
either . . . or, neither . . . nor, etc. Her reasoning 

was that sentences utilizing phrasal, clausal, parentheti- 
cal, or transitional elements would be indicative of a more 
mature or sophisticated style. And devices which provide 
means for coordination or subordination, such as correla- 
tive conjunctions, might be expected to predict human essay 

evaluations . 

To test this relationship Miss Trailer used a lexicon 
of 11 cotnmon correlative conjunctions, taking Pence (1947) 
as a guide. For the 256 "free" essays, the resulting fre- 
quencies of such correlative conjunctions are shown in 
Table VII-4. Obviously, certain items dominated the usage 
of the high school students concerned, especially eithe r. . . 
or and if. . .t^, which together accounted for more than 
half of the occurrences. And with the judged quality of 
essays, these tiny frequencies had correlations hovering 
around zero, with the highest for any trait being a (non- 
significant) -.11 with rated creativity. 
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TABLl!. VII-4 



DISCOViEH) Fr.iJ..U‘I^CIES 
CF CCKHELATIVE CCNJUImCTIONS 



Correlative 


Frequency 


either. •• or 


57 


neither. . .nor 


17 


both... and 


26 


not only... but also 


7 


not only. ..but 


8 


iLX . 9 .then 


kk 


although. . . still 


3 


although. . .yet 


0 


though. .. still 


0 


though. . .yet 


0 


since. . .therefore 


0 



162 
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This particular investigation, then, explored one 
small facet of language usage in high school essays . The 
hypothesis that correlative conjunctions might furnish 
additional clues to writing quality was not supported by 
the data, but an algorithm was developed to permit the 
searching for separated words and word clusters in the 
text . 

Verb constructions . Another investigator was in- 
terested in whether type of verb syntax would help predict 
essay quality. Thomas F. Breen pointed out that many 
textbook writers for composition teaching inveigh against 
the use of the passive voice, and claim that the active 
voice is almost always to be preferred (Gleason, 1965) . 

But one would believe that perfect tenses, since they 
differentiate time, would characterize better writing 
(Scott, 1960) . 

Breen therefore developed an. algorithm which would 
identify and count uses of the perfect tenses and the 
passive voice. His strategy was to locate the auxiliary 
verbs (forms of "have" or "be") and then look for a past 
participle (the algorithm searched for an - ed ending, or 
for membership on a list of 213 irregular past participles) . 
Two general exceptions were noted: If a form of "be" were 

followed by a relative pronoun, then by a past participle, 
it was not counted as a passive verb. (Example: There 

were many people (who) sent gi^ts.") Similarly, if a form 
of "have" were followed by the word "to," then by a past 
participle, it was not counted as a perfect form. (Ex- 
ample: "Someday you will have (to) come here.”) 

With the algorithm so developed, he found 367 occur- 
rences of the perfect tenses, and 1323 occurrences of the 
passive voice. Of these latter, 135 were believed to be 
accompanied by a possible agent of the passive verb, a form 
generally regarded as worse than passive verbs not accompany- 
ing such explicit agents. 



In general, Breen *s hypotheses were not supported by 
the data. When the raw frequencies of such occurrences 
were correlated with the overall quality of essay, perfect 
tenses had a mere .03 relation. Passive voice occurrences 
had a correlation of .28 with essay grade, contrary to the 
prediction. And passive voice occurrences together with 
an agent had a correlation of .13 with essay grade. Un- 
f o^tuiistely , the investigator did not control for essay 
length, which would expectably be correlated with these 
occurrences , and the discovered correlations are therefore 
harder to interpret than they might otherwise be. For 
example, if passive occurrences have a high correlation 
with essay length, and as we know essay length has a sub- 
stantial correlation with essay quality, then the apparent 
correlation of passive occurrences with essay quality might 
be an illusion, and the meaningful correlation of the two 
variables might in fact be zero . And there are other 
possible third variables which would account for the appar- 
ent anomalies in the results. In any case, the project is 
turning toward a deeper syntactic analysis, as will be 
described in a later chapter. 

Parenthetical Expressions . The final substudy described 
in this chapter was conducted again by Donald Marcotte. 
Parenthetical expressions are frequently used asides in 
''^^i'ting. When properly employed, they are effective de- 
vices even though they do not contribute measurably to the 
over-all meaning of the sentence. The object of this sec- 
tion of the study is to determine whether the students used 
parenthetical expressions, and whether they used them 
judiciously. If so, then correlations between grades given 
on style and use of parenthetical expressions should be 
significant, and students using parenthetical expressions 
should receive higher grades than those not using them. 
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To test these hypotheses, we must first be able to 
identify a parenthetical expression. Fortunately, a 
parenthetical expression has two identifying features. 

The first identifying feature is its required punctuation. 
For example, Warriner and Griffith (1957, p. 580) state 
that "If he [the writer] wishes the reader to pause, to 
regard the expression as parenthetical, he sets it off; 
if not, he leaves it unpunctuated." Three types of punctua- 
tion marks are used in "setting off" the expression: 
commas, parentheses, and dashes. 

The second identifying feature of the parenthetical 
expression is its placement in the sentence. According 
to Summey (1949, p. 60), there are three positions: "...(1) 
preliminaries, standing at the beginning of sentences or 
sentence members, (2) parenthetical groups in intermediate 
positions — commonly called parenthetical expressions with 
further qualification, and (3) tags or end parentheses." 

With these two discernible cues, punctuation and 
position, and with a dictionary of parenthetical expressions, 
the computer can be programmed to identify these expressions 
in essays. The computer's dictionary consisted of 94 
parenthetical expressions obtained from the textbook sources 
cited earlier, and from the opinionation-vagueness list al- 
ready described. 

Correlations and t-tests were used by Marcotte in 
the statistical analysis. First, correlations were computed 
to determine the relation between position of expression 
and grade given on style, and the relation between propor- 
tion of expressions used to number of sentences and grade 
given on style. Second, t-tests were used to determine 
if the group using parenthetical expressions received 
significantly higher grades on each of five traits (Ideas 
or Content, Organization, Style, Mechanics and Creativity) 
than the group not using parenthetical expressions. 



Less than half (n=112) of the students used paren- 
thetical expressions contained in the computer program dic- 
tionary. Of the 216 expressions found, 132 were used to 
introduce sentences, sixty-seven were used within sentences, 
and seventeen were used to end sentences. Also, commas 
accounted for the punctuation of 215 expressions; the re- 
maining expression was set apart by parentheses. 

Table VII-5 consists of a list of identified paren- 
thetical expressions. Evidently, words like "also," 
"however," "no," "therefore," and the phrase "for example" 

are favorite items. 

Table VII-6 shows the results of one-tail t- tests. 

All five comparisons were significant at the .01 level. 
However, the largest t-value was for style, as was expected. 
Apparently, the use of parenthetical expressions, proper 
use of course, has some bearing on the grades given on 

essays . 

Table VI 1-7 shows the correlations between position in 
the sentence and style. Also shown are the correlations 
between proportion of number of expressions in the essay 
to number of sentences in the essay and style . 

Except for the end position of the expression, all 
correlations are significant at either the .01 or the .05 
level. 

A summary of the Marcotte results, then, is as 
follows: 

(a) One hundred twelve students used parenthetical 
expressions . 

(b) Two hundred fifteen expressions were set-off 
by commas. 

(c) One expression was set-off by parentheses. 

(d) No dashes were used to punctuate the expressions. 
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TABLE VII-5 



PAfiENTHETICAL EXPHESSIONS USED 



Expression 


Frequency 


Beginning 


Within 


Eld 


after all 


1 


1 


0 


0 


all in all 


1 


1 


0 


0 


also 


15 


11 


0 


4 


at least 


1 


0 


1 


0 


for exuiple 


17 


14 


2 


1 


for the most part 


1 


0 


1 


0 


furtheniore 


1 


1 


0 


0 


fenerallj 


1 


1 


0 


0 


however 


61 


32 


28 


1 


I aa sure 


2 


0 


1 


1 


I believe 


2 


2 


0 


0 


I suppose. 


1 


0 


1 


0 


I think 


1 


0 


1 


0 


if possible 


1 


0 


1 


0 


in addition 


1 


1 


0 


0 


in conclusion 


3 


3 


0 


0 


in general 


2 


0 


2 


0 


in mj opinion 


S 


6 


2 


0 


it seems 


1 


0 


1 


0 


likewise 


2 


1 


1 


0 


maybe 


1 


1 


0 


0 


more or less 


1 


0 


1 


0 


moreover 


1 


1 


0 


0 


nevertheless 


3 


3 


0 


0 


no 


15 


10 


0 


5 


obviouslj 


1 


1 


0 


0 


of course 


7 


4 


3 


0 


oh 


5 


5 


0 


0 


on the other hand 


8 


3 


5 


0 


ordinarily 


1 


1 


0 


0 


perhaps 


4 


1 


2 


1 


probably 


1 


0 


1 


0 


sometimes 


4 


4 


0 


0 


still 


1 


1 


0 


0 


that is 


3 


3 


0 


0 


therefore 


15 


12 


3 


0 


though 


4 


0 


3 


1 


to be sure 


2 


0 


2 


0 


too 


6 


0 


3 


3 


usually 


2 


2 


0 


0 


weU 


7 


6 


1 


0 


why 


1 


0 


1 


0 
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I 

I 



MEAN GRADE DIFFERENCES 
BETWEEN THE 

PARENTHETICAL AND NON-PARENTHETICAL GROUPS 



Traits 


Difference 
between Means^ 


t 


Probability 


Ideas or Content 


1.94 


3.05 


-c.Ol 


Organization 


. 2.2a 


3.42 


c.Ql 


Style 


2.43 


4.02 


^.01 


Mechanics 


2.71 


3.57 


^.01 


Creativity 


1.66 


2.60 


^.01 



^Parenthetical minus non-parenthetical. 



I 

s 
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TABLE VII-7 




X 

& 



I 



[2 



I 






% 




c 



CORRELATIONS BETWEEN 
POSITIONS OF PARENTHETICAL EXPRESSIONS 
ANP STILE 



Position 


X 


S 


r 


a 

P 


Totals 


Beginning 


. 0.516 


0.863 


0.23 


-C.01 


Within 


. 0.262 


0.599 


0.21 


/L.0^ 


End 


. 0.066 


0.293 


0.00 


n.s. 


Total 


. 0.864 


1.154 


0.28 


4..01 


Proportions 


Beginning/No. of Sent.... 


. 0.023 


0.039 


0.19 


^.01 


WithlV^* 


. 0.013 


0.031 


0.11 


-C.05 


End/No. of Sent 


. 0.003 


0.013 


.0.02 


QeS e 


Total/No. of Sent 


. 0.038 


0.054 


0.20 


<..01 



A 

Cne-tail test. 



I" 

% 

I 

i 



t 
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(e) Except for the end position, all correlations 
for position were significant. 

(f) Students using parenthetical expressions re- 
ceived significantly higher grades on all 
traits than those students not using the 
expressions. 

Again, it is wise to note some reservations about these 
findings. It is probable that all of the reported differ- 
ences for parenthetical expressions might be affected some- 
what by essay length. If an expression occurs in one essay 
but does not occur in other, it is likely that the one in 
which it occurs is a longer essay than the one in which it 
does not occur. And this relationship could have influenced 
the significance levels of Table VII-6. The second half 
of Table VII-7 attempts to provide for this influence, by 
controlling for the number of sentences in the essays. 
Nevertheless, this is not a wholly satisfying control, 
since sentences containing parenthetical expressions might 
be presumed longer than sentences not containing such ex- 
pressions; and the factor of essay length might still be 
the major contributor to the observed relationship. Further 
multivariate study must be conducted to ascertain just how 
useful the discovery of these parenthetical expressions is 
going to be. 

However, one portion of the present finding does not 
appear subject to this criticism of length, and is also 
very pleasing from an intuitive point of view. This is 
the contrast found, in the bottom half of Table VII-7, for 
the various positions of the parenthetical phrases. The 
correlation with quality of the proportion of beginning 
phrases is .19; of the within phrases is .11; and of the 
end phrases a (non-significant) -.02. This order coincides 
very nicely with the general view that end expressions are 
weak, dangling, and anti-climactic, and that middle ex- 
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pressions too often interrupt and divide the sentence syntax. 
This work on parenthetical expressions deserves further 
attention. 

Suininary . This chapter has made a significant exten- 
sion in the facilities of essay analysis, by introducing a 
powerful and convenient phrase look-up subroutine, called 
PHRASE, written primarily by Marcotte, and tested with a 
number of sub-studies. One of these investigated the 
importance of cliches in predicting essay quality . It 
found that cliches were, first, surprisingly rare in 
occurrence in student papers and, second, quite inert in 
their apparent influence on ratings by expert human judges. 

A second study also used the PHRASE algorithm, with some 
subjectively constructed dictionaries, to investigate hypothe- 
sized traits of opinionation, vagueness, and specificity in 
the same student essays. Although found to correlate in 
predicted directions with essay quality, these three traits 
did not, apparently, contribute important unique predictive 
variance to the ratings. Other studies reported here in- 
V6stigated correlative conjunctions and verb construction 
in an effort to find predictors of essay quality. And a 
final study showed a positive relationship between writing 
quality and the use of parenthetical expressions, and their 
position in the sentence where used. In general, these 
uses of phrase procedures had varying degrees of success 
in the search for the sources of essay quality, but to- 
gether they indicate the expanded utility of the essay 
analysis program. 
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CHAPTER VIII 



ON-LINE ANALYSIS AND FEEDBACK 

As we have seen from the earlier chapters, this pro- 
ject has repeatedly demonstrated that a computer can read 
a student's essay and return a numerical rating which in- 
dicates the quality of the essay on one of a number of 
traits . These ratings have been found to be as reliable 
as those assigned by trained human judges • Since the 
computer can return such ratings one might well ask, "What 
else can the computer return, given an essay as input? Can 
the computer make comments about a student's essay, and if 
so, on what can these comments be based?" 

In an attempt to answer these and similar questions, 
a computer program was developed. This program, the inter- 
active essay grader, instructs the computer to read a stu- 
dent's essay, to make a series of comments about the essay, 
and to allow the student to correct some errors which the 
computer found, all in conversation inode. 

We should make it clear at the very beginning that 
this program is not to be taken as a model of expert peda— 
gogy. The program requires much refinement before it can 
be used in a real school situation. The primary purpose 
of the program is to illustrate some of the things that 
can be done, and to reveal some of the problem.s which were 
encountered in its development. Most of this work has been 
caj^r’ied out by Dieter Paulus, with some assistance- by 
Michael J. Zieky, and was reported in much the present form 
to the American Psychological Association (Paulus, 1967) . 










Background . Since we are basically concerned with a 
simulation problem, simulating by computer the feedback an 
English teacher might provide for her students, a reasonable 
place to start would be in the examination of some of the 
comments an English teacher might make. 

First, the teacher might look at the content of the 
essay to see whether the student has demonstrated an under- 
standing of the required concepts and a knowledge of the 
required facts. At present, no attempt was made to program 
the computer to comment on the content of the student's 
essay. (There is work now beginning in this field.) 

Second, a teacher may look at the general structure 
of the essay. Here the teacher might be concerned with 
the soundness of the inferences a student draws, whether or 
not the mode of expression used by the student is appro- 
priate, where the essay lacks clarity, or where a point 
needs further support. Here again, no efforts were made in 
this program to allow the computer to deal with these areas. 

A third aspect of an essay that a teacher may look 
at in a student's essay, and frequently this is the most 
important and most time consuming task in which an English 
teacher is involved in the teaching of elementary writing 
skills, is the judging of the appropriateness of the stu- 
dent's word usage, determining errors in declension, noting 
spelling errors, and so on. Comments relating to these 
areas are appropriately made if the writing of an essay is 
seen primarily as a drill exercise, and the student is 
asked to write many essays so that he may learn to avoid 
these errors. It is the comments a teacher writes relative 
to these types of errors that the present computer program 
attempts to simulate; for these comments are rather routine 
and take up much of the teacher's time and energy. If the 
computer can successfully take over this task, then it would 
be doing the teacher a tremendous service, as she could spend 
her time and energy in making comments of the first two types. 
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The type of feedback which this program attempts to 
simulate is of the prescriptive sort, comments that tell 
the student to avoid certain usages and suggests certain 
alternatives. If the student's essay deviates too greatly 
from the norm, then the computer indicates to the student 
where potential problems may lie and suggests corrective 
measures . 

The progrcim . The interactive essay grader is, with 
the exception of one short subroutine, written entirely 
in FORTRAN IV. The program was written on a remote tele- 
type terminal, connected by telephone cabel to M.I.T.'s 
IBM 7094 computer. A student who wishes to use the pro- 
gram simply types a code word on the console and the pro- 
gram begins to execute. At the appropriate point in the 
execution of the program, the computer asks the student 
to write the essay in natural language. The only restric- 
tion imposed on the student are special punctuation marks. 
This is due to the limited character facility of FORTRAN 
IV. When a subject has completed the essay, he is instruc- 
ted to type an asterisk. The computer then starts almost 
immediately to respond and to comment on the student s 

essay. 

AS a language, of course, FORTRAN IV is not particular- 
ly well suited for natural-language computing. Therefore, 
the program in its present form is relatively inefficient 
and lacks elegance. Nevertheless, the computer requires 
only about twelve seconds of machine time to evaluate and 
to begin comment on an essay. Printing speed is, of course, 

considerably slower. 

For purposes of describing the program, it may be 
conveniently, though artifically, divided into five parts. 
These are (1) the grading routine; (2) the prescriptive 
comments; (3) comments based on actuarial characteristics 
of the essay; (4) the interactive spelling froutine; and 
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(5) the recalling and recording of data about the essay. 

Each of these will be discussed in turn. 

The program calculates a numerical grade for an essay 
by using a weighted sum of scores on eight variables. 

These variables were selected by a step-wise multiple re- 
gression p<*ocess from the original 30 proxes used in the 
larger project. The eight variables which are included 
yield a multiple correlation coefficient of approximately 
.60 when used to estimate expert human ratings. The pro- 
gram takes the numerical grade and selects an appropriate 
comment from a list of comments. If, for example, the 
grade is quite high, the computer writes, "I think that 
you did quite well. Keep up the good work I" On the other 
hand, a very low grade calls for the response, "I don't 
think that you did at all well. Are you taking this assign- 
ment seriously?" Intermediate grades call for other com- 
ments. (Incidentally, if a student tries to fool the com- 
puter and types nothing but nonsense , the computer responds , 
"Stop wasting my time! If you don't stop playing around I 
will report you to your teacher".) These comments are used 
instead of numerical scores because they are presumedly 
more meaningful to the student than, say, the number 2.8634. 
If teachers usually had time to write comments, they un- 
doubtedly always would. The number or letter grade alone is 
primarily designed to save time. 

The prescriptive comments are called by a binary search 
phrase look-up subroutine which search lists that have pre- 
viously been entered into the computer's memory. Michael 
Zieky was primarily responsible for developing this portion 
of the program. The lists which can be searched by the 
computer may be of almost any length, limited only by the 
size of the computer. Since search time is not directly 
proportional to the length of the lists, these lists can 
grow to great lengths with only a trivial loss in computing 
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time. For examiple, to search a list containing 16,000 words 
recruires only one more comparison by the computer than to 
search a list of some 8,000 words. Hence the criticism as 
to why a particular word or phrase is not included in the 
list is quickly dispelled by simply including that word or 
phrase. 

The general classes of words and phrases included in 
this list and on which the computer comments are as follows : 
(1) Taboo words, such as "aint" or "busted"; (2) misuses of 
case, such as "theirselves" or "to who"; (3) use of "of" 
for "have", "could of" or "should of", for example; (4) noun- 
verb disagreement, for example "both is" or "I are"; 

(5) misuse of homonyms, "their is", for example; (6) vulgar 
idioms such as "somewheres" or "that there"; and (7) double 
negatives such as "can't hardly" or "don't scarcely". As 
indicated before, it is only the researcher's knowledge 
and imagination that limits the classes and. number of words 
or phrases to be included. 

If the computer finds an improper usage it prints a 
message. For example, if the word "irregardless" is en- 
countered by the computer, it responds, " 'Irregardless is 
actually a double negative. If you examine the first and 
last syllables you will see why.", or if the student writes 
"should of" the computer responds, "When we speak quickly 
the word 'have' often sounds like 'of. But it mould never 
be written that way." If the student writes "busted" the 
computer responds, "Do you really think the past participle 
of 'break' is 'busted' or were you just being careless?" 

Comments based upon actuarial characteristics of the 
essay are printed whenever some characteristic of the essay 
deviates too greatly from its normative use. For the time 
being, norms are based on a sample of 256 essays used in 
previous analyses and can readily be changed as the type 
of essay changes. 



These comments are generally stated so as to indicate 
to the student that there may be a problem with given as- 
pects of his essay. The computer might say, for example, 

"Your sentences seem long and complicated. ... and ask the 
student a question, or suggest how the difficulty may be 
overcome. That is, the computer indicates that there 
may be a problem and suggests that the student check to 
see whether or not a problem really exists . 

The interactive spelling routine again utilizes a 
binary search to determine which words are misspelled. The 
pj^ 030 nt list includes some 750 words. First the computer 
prints a list of the misspelled words that it found, then 
it gives the student an • opportunity to correct them. The 
student is asked to spell a given word correctly, and if 
he does so, the computer responds "That is correct. Very 
good." If the student continues to spell the word in- 
correctly, the computer first suggests that the student try 
again, then, if it is again incorrectly spelled, that he 
look the word up in a dictionary. If the student again 
makes an error, the computer finally suggests that the 
student go and seek his teacher's help; then it goes on to 
the next word. The computer determines whether or not the 
word is spelled correctly by looking the word up in a list 
of correctly spelled words corresponding to those spelled 
incorrectly in the spelling list. 

There are several problems inherent in this procedure. 
First, the list of approximately 750 misspellings seems to 
be quite inadequate. This judgement is based on the exam- 
ination of a glossary of the words used in 256 essays written 
by high school students . It was discovered that only a 
fraction of the misspellings found in those essays appeared 
on the spelling list. But again, the list can be easily 
increased in length. Second, it is sometimes difficult to 
determine whether or not a word should be included in the 
spelling list at all, since some commonly misspelled words 
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correctly spell some other words. For example, if the word 
'MUSSEL' is included as a possible misspelling for the 
word 'MUSCLE', then if the student really intended to 
spell 'Mussel' it will be counted as a misspelled word. A 
further problem involves a student who misspells a word not 
in the anticipated manner, in a plural form for example. 

Then the computer will not recognize the word as a misspel- 
ling. A partial solution to these problems may lie in the 
inclusion of an extensive dictionary of correct spellings 
along with given rules for forming plurals, possessives, etc. 
which may be applied by the computer. This approach is 
currently under investigation by Francis Archambault. 

The last part of the program deals with the recording 
of data about a student's essay for use in making comments 
on future essays by that same student and for reporting the 
student's progress, or lack thereof, to his teacher. These 
data are recorded on a disk and are always available to the 
computer. The computer can, therefore, look back to the 
student's previous performance and compare his present per- 
formance to that. For example, when commenting on a stu- 
dent's overaJ.l grade, the computer can add, "You did much 
better than last time. Very good!", or, if the student 
makes a greater number of grammatical or word usage errors, 
the computer may comment, "With respect to. grammar and word 
usage you have done considerably worse this time than last 
time" . Similar comments are made when the total number of 
spelling errors are reported. If a student makes the same 
spelling error in two consecutive essays, the computer com- 
ments, "By the way, you made this same error the last time 
that you wrote an essay for me. Please be more careful." 

These data are also used to provide the teacher with 
feedback about a student's essays. By typing the appro- 
priate code word and code number on the teletype console, 
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the teacher may obtain a summary of the grades on all of 
the essays a student has written, as well as other summary 
information about various errors the student has made in 
each essay that he has written. If the teacher wishes, 
she can obtain complete copies of essays any student has 
written. 

Figure VIII-1 will show some sample computer output. 

The computer’s comments are designated by ordinary type. 

The student's essay and responses are in italics. The 
essay which the computer evaluated was written by Paulus 
to demonstrate some features of the program. 

Summary .- A computer program has been developed which 
provides limited feedback to both the student and to the 
teacher about a student's essay. Content and general 
stylistic considerations are not considered as yet, but 
the program now comments on some errors in grammar and 
word usage, as well as on certain actuarially derived vari- 
ables. A great deal of flexibility is provided for in- 
creasing the number of errors which the computer can recog- 
nize and comment upon. The program is, as of now, untried 
in real classroom situations. But it does illustrate a 
number of interesting possibilities and indicates that this 
general area is a fruitful area for further research. The 
computer can provide relevant feedback to the student about 
his essays. This feedback can deal with a variety of topics 
and can take many different forms. 

Future steps should eventually include the refinement 
of the spelling routine, the development of longer and 
more comprehensive lists of possible errors, refining and 
increasing the number of comments , and re-writing the entire 
program in a more suitable language, such as PL/1. Teachers 
should also have greater control over comments and when they 
are printed. Carl Helm's PROTRAN, or a similar language, 
appears to be admirably suited for such a purpose. 
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FIGURE VII I- 1 



(Note; The "student's" responses are in italics) 

I AM AN IBM COMPUTER. WHAT IS YOUR NAME PLEASE. Q 
JOEMY H. DOE 

DID YOU EVER WRITE AN ESSAY FOR ME BEFORE. Q 
YES 

HELLO, JOHNNY H. DOE, IT IS NICE TO TALK TO 

YOU AGAIN. PLEASE TYPE THE CODE NUMBER ASSIGNED TO YOU THE 
LAST TIME WE TALKED. 

89 

THANK YOU. WOULD YOU PLEASE READ THE DIRECTIONS 
ON THE FRONT OF THE MACHINE NOW, AND BEGIN WHEN READY. 



THE NAME OF MY FAVORIT TELEVISION PROGRAM IS BATMAN, 
I WATCH IT EVERY CHANCE I GET EXERT LAST WEEK WHEN OUR 
TV SET WAS BUSTED. THEN I COULDN'T SEE IT. 

THE WEEK BEFORE THAT THEY HAD A REALLY GOOD SHOW ON. 
THE CATWOMAN STOLE SOME JEWELERY SHE LIKED. SHE HID THE 
STUFF SOMEWHERES . I BELEIVE THAT IT WAS IN HER HIDE-OUT. 

BATMAN AND ROBIN AINT STUPID' SO THEY SOON FOUND IT 
AND ALMOST CATCHED THE CATWOMAN. SHE GOT AWAY DURING THE 
FIGHT.' THEY COULD HAVE FOUND HER LATER BUT THEY DIDN'T 
WANT TP GIVE THEIRSELVES AWAY. 

THEIR IS OTHER NEAT CROOKS ON THE PROGRAM TOO. LIKE 
THE PENGUIN THE RIDDLER AND THE JOKER. 

THEY ARE ALWAYS TRYING TO GET THE DYNAMIC DUO^ BUT 

THEY NEVER SEEM TO BE ABLE TO. 

* 



**7094 **** 7094****7094 ^*** 7094*^**7094 ^**'* 7094 ^*** 7094 '^^*'^ 



HERE IS SOME INFORMATION ABOUT YOUR ESSAY. 
** 7094 **** 7094 **** 7094 **** 7094 **** 7094 **** 7094 **** 7094 **** 
I DON'T THINK THAT YOU DID VERY WELL. 

TRY HARDER NEXT TIME AND PAY CLOSER ATTENTION TO WHAT I 
TELL YOU. 

(YOU DIDN'T DO AS WELL AS YOU DID LAST TIME.) 



Continued 









(IF YOU DON'T AGREE WITH THIS, OR WITH ANY OF MY 
OTHER COMMENTS, CHECK WITH YOUR TEACHER.) 

IN RESPECT TO GRAMMAR AND WORD USAGE, YOU 

HAVE DONE ABOUT AS WELL THIS TIME AS YOU DID LAST TIME 

YOU WROTE AN ESSAY FOR ME. 

DO YOU REALLY THINK THE PAST PARTICIPLE OF "BREAK" IS 
"BUSTED", OR WERE YOU JUST CARELESS. Q 

"SOMEWHERES" IS AN EXAMPLE OF POOR SPEECH HABITS 
SHOWING IN YOUR WRITING. THE WORD SHOULD BE 
PRONOUNCED AND SPELLED WITHOUT THE FINAL "S" . 

I NOTICE YOU USED THE WORD TAINT". MANY PEOPLE STILL 
OBJECT TO IT. SEE IF YOUR TEACHER ALLOWS THAT WORD 
BEFORE YOU USE IT AGAIN. 

SOME VERBS ARE IRREGULAR BECAUSE THEY FORM THEIR 
TENSES IN A STRANGE WAY. "CATCH" IS SUCH A VERB, 

AND ITS PAST TENSE SHOULD BE "CAUGHT", NOT "CATCHED" . 

THE WORD "THEIRSELVES" SHOULD BE REPLACED BY "THEMSELVES". 
SEE IF "THEIRSELVES" IS IN YOUR DICTIONARY. 

I SEE THAT YOU HAVE USED "THEIR" FOLLOWED BY A 
VERB. THE CORRECT FORM IS "THERE". WORDS THAT 
SOUND ALIKE BUT ARE SPELLED DIFFERENTLY ARE 
CALLED HOMONYMS. CAN YOU THINK OF ANY OTHERS. Q 



YOUR ESSAY IS CONSIDERABLY SHORTER THAN THE ASSIGNMENT 
REQUIRED. HAVE YOU FULLY DEVELOPED THE TOPIC. Q 

YOUR SENTENCES SEEM SHORT AND CHOPPY. COMBINE SEVERAL 
USING APPROPRIATE CONNECTIVES SUCH AS 'AND.' OR 'OR', FOR 
EXAMPLE. 



YOU USED 5 PARAGRAPHS. 

WHY SO MANY.Q TRY TO REORGANIZE THE ESSAY SO THAT YOU 
DON'T USE AS MANY PARAGRAPHS. USUALLY, A PARAGRAPH SHOULD 
CONSIST OF SEVERAL SENTENCES, AND SHOULD PRESENT ONE 
COMPLETE UNIT OF THOUGHT o 

I NOTICED THAT YOU DIDN'T USE VERY MANY COMMAS. DID YOU 
SEPARATE ITEMS IN A SERIES BY COMMAS, USE COMMAS IN 
COMPOUND SENTENCES, USE COMMAS AFTER INTRODUCTORY PHRASES. Q 
PERHAPS YOU HAD BETTER GO BACK TO YOUR ESSAY AND CHECK 
THOSE THINGS. 



Continued 
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I AM NOT CERTAIN BUT I THINK THAT 
YOU MADE 4 SPELLING ERROR (S) . 

THAT IS A GREATER NUMBER OF ERRORS THAN 

YOU MADE IN YOUR LAST ESSAY. PLEASE BE MORE CAREFUL. X 

HERE IS A LIST OF YOUR SPELLING ERRORS. 

FAVORIT 

EXEPT 

JEWELERY 

BELEIVE 

(THERE MAY BE OTHER ERRORS THAT I DID NOT FIND.) 

LET'S SEE IF WE CAN CORRECT THE ERRORS I FOUND. 

LOOK AT THE FOLLOWING WORD. 

FAVORIT 

PLEASE TRY TO SPELL THE WORD CORRECTLY. 

(TYPE ON THE NEXT LINE AND PUSH 'RETURN' ONE TIME.) 

FAVORITE 

THAT IS CORRECT. VERY GOOD. 

LET'S GO ON TO THE NEXT WORD. 

LOOK AT THE FOLLOWING WORD. 

EXEPT 

PLEASE TRY TO SPELL THE WORD CORRECTLY. 

EXCEPT 

THAT IS CORRECT. VERY GOOD. 

LET'S GO ON TO THE NEXT WORD. 

LOOK AT THE FOLLOWING WORD. 

JEWELERY 

PLEASE TRY TO SPELL THE WORD CORRECTLY. 

JEWLERY 

Continued 
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NO, THAT IS STILL NOT CORRECT. 

WOULD YOU PLEASE TRY AGAIN. 

JUWELERY 

I AM SORRY. THAT IS NOT CORRECT. 

WOULD YOU PLEASE LOOK THE WORD UP IN THE DICTIONARY AND 
TRY AGAIN. Q 

JEWELRY 

THAT IS CORRECT. VERY GOOD. 

LET'S GO ON TO THE NEXT WORD. 

LOOK AT THE FOLLOWING WORD. 

BELEIVE 

(BY THE WAY, YOU MADE THIS SAME ERROR THE LAST TIME THAT 
YOU WROTE AN ESSAY FOR ME. PLEASE BE MORE CAREFUL. X) 

PLEASE TRY TO SPELL THE WORD CORRECTLY. 

BELIEVE 

THAT IS CORRECT. VERY GOOD. 

WELL, I HOPE THAT TAKES CARE OF THE SPELLING ERRORS. 

PLEASE TRY NOT TO MAKE THE SAME ONES AGAIN. 

**7094****7094****7094****7094****7094****7094****7094**** 



WELL, JOHNNY H. DOE, IT WAS NICE TO 
TA:CiK to YOU AND TO READ YOUR ESSAY, 

I HOPE THAT YOU WILL COME BACK SOON TO WRITE ANOTHER 
ONE. MEANWHILE, PLEASE THINK ABOUT WHAT I TOLD YOU. 

GOOD BYE. 

**7094****7094****7094****7094****7094****7094****7094**** 

**7094****7094****7094****7094****7094****7094****7094**** 

PLEASE NOTE YOUR NEW CODE NUMBER WHICH IS 90. 

THANK YOU. 

DO YOU WANT TO WRITE ANOTHER ESSAY NOW.Q 
PLEASE ANSWER 'YES' OR 'NO*. (NO. BLANKS.) 

. NO 



EXIT CALLED. 
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Even though much work remains, and many problems are 
as vet unsolved, the interactive essay analyzer designed 
by Paulus seems to have opened the door to research in a 
relatively new aspect of computer assisted instruction, 
an aspect of computer assisted instruction that allows the 
computer to assume a greater role than that of a "mechan- 
ized scrambled book" . The computer begins to understand 
what it is told by the student and is able to intelli- 
gently respond to him. Such on-line work should eventually 
become an important area of application. 
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CHAPTER IX 



CONCLUSIONS AND IMPLICATIONS 

Chapters I through VIII have discussed rationale, 
methods of empirical research, and various findings from 
the work to date. However, the investigators have recog- 
nized from the beginning the extreme newness of this 
study, and its vast potentialities for the future of educa- 
tional measurement and instruction. Consequently, part 
of the original charge of this project was to scan the 
field constantly for new opportunities of research and 
practice. Some of the recognized opportunites will grow 
rather directly out of the work so far accomplished with- 
in the project, but others will stem from sythesis with 
other work in related fields. Therefore this chapter will 
perform three functions: (1) It will summarize the pre- 

ceding chapters and the major line of work within this 
project. (2) It will discuss work in tangential fields, 
and the general status of the disciplinary interface most 
appropriate to the future of essay analysis. (3) It will 
point out some appropriate directions for future work 
within the field of educational measurement and instruc- 
tion, future work which may be closely related to this 
project . 



1 . 



SunuTiary of Work Completed 



Rationale . The basic strategies of the computer 
analysis of essays have all grown out of an attempted 
simulation of human ratings. The fundamental approach 
has been to seek a goal of automatic analysis of stylis- 
tic qualities in essays, and the techniques have been 
generally actuarial. That is, we have looked for a simu- 
lation of human, expert judgment of in trin sic qualities 
(trins) , through an exploration of correlated, or ap prox i- 
mate variables (proxes) , which could be made logistically 
available for computer measurement. 

When this general strategy was decided upon, there 
were various problems which needed to be solved: The 

subjects have largely consisted of Wisconsin High School 
students who, in 1962 , wrote a series of essays under con- 
trolled conditions. (There have been other subjects not 
so intensively studied.) There was abundant information 
about the Wisconsin students. The data to be analyzed 
for proxes consisted of various sets of essays written by 
these students, as key-punched literatim for computer in- 
put. The criterion for sucesss in computer strategy has 
consisted of the trins of expert human judges, first rat- 
ings for overall quality generated by four judges for each 
essay, and later ratings for ideas, organization, style, 
mechanics, and creativity generated by eight ^different 
and independent) judges for each essay. 

The proxes themselves consisted of various computer 
measurements hypothesized to have a potential relationship 
to the trins sought after. Some of these were statistical 
counts relating to length within the essay , and others were 
measures of types of words used. Still others investigated 
characteristic of sentence openings or other structures • 
Thirty proxes, which were most extensively explored, largely 
treated single words as units . Later proxes have- treated 
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various patterns of phrases, both intact and separated. All 
of these proxes were studied for possible correlation with 
the trins of essay quality, either in bivariate or multi- 
variate relationships, and their ability to predict trins 
is in some ways the backbone of the empirical work to date, 
just as the development of the rationale, and of the 
various programming and statistical strategies used, is 
the backbone of the methodological work to date. 

F indings . Chapter III specified hypotheses about 
certa7n of the proxes, and described the computer program, 
(called PEG, listed in Appendix A) , with some of its 
features. Chapter IV explored the questions of reliability 
and validity of the proxes, and showed the ability of the 
computer strategy to predict the overall rating of quality 
about as well as the average human judge. It also dis- 
cussed some of the ways in which the computer may be 
superior to the judge: especially in adjusting the 

"severity" and the dispersion of the grading system accord- 
ing to any uniform, predetermined standard. On two sets 
of essays, the computer program was able to reach multiple- 
regression coefficients of .71. Also, one essay’s proxes 
were able to predict the judgments of other essays written 
by the same student, to a MULTR of .62. A conservative 
cross-validation of the program showed the ability to 
generate large numbers of ratings which were indistinguish- 
able from those of the human judge. In sum, the proxes con- 
tributed significantly, in the predicted directions, to 
produce quite humanoid^ ratings of overall quality. And 
the Paulus tables were convenient tools for such multi- 
variate analysis. 

Chapter V made a major expansion in the program, by 
moving the simulation strategies to a profile of scores. 

The human ratings were those of 32 expert English teachers, 
with eight judges evaluating each of 256 essays on five 
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major traits of writing quality, each spelled out carefully 
according to accepted dimensions. The individual judges 
were found to correlate only weakly with each other, but 
there was a strong tendency to a halo effect, i.e., to great 
uniformity of profile for any given essay judged by any given 
rater. However, there was a sufficient profile consensus 
for a singif leant interaction of trait by essay. The 
proxes contributed differentially to the five traits and, 
halo aside, there were interesting relationships shown: 

For example, length of essay contributed highly to content, 
organization, and creativity, but not at all to mechanics. 
There was thus intuitive mutual support for the validity 
of the ratings and of the computer system. 

The intercorrelations of the traits showed coeffi 
cients which were actually higher than the reliability of 
the individual traits, a surprising finding but an under- 
standable one in view of the halo tendency, and the rela 
tive independence of the reliability. Some effort was 
made to cluster common judge viewpoints into a purer 
criterion, for purpose of simulation, and implications of 
this work were discussed. A most interesting comparison 
of this chapter was the relative ability of the computer 
program to simulate the various traits. Although human 
judges were much more reliable in judging mechanics than 
in judging any other trait, and somewhat less reliable in 
judging creativity, the computer program displayed no sue 
handicap, and did as well with the more subjective, 
"qualitative" dimensions as with any. 

Chapter VI made some studies of the problem of non 
linearity of prediction in such multivariate simulation. 
Clearly, some of the ?rox distributions were odd ones, and 
their relations with each other, and with the criterion, 
were irregular. The two methods of correction explored were 
interaction terms 2uid transformations of the proxes. Fo 
various reasons, these were not successful in increasing 



the overall cross-validated multiple regression, and for 
practical purposes the linear assumption remained a power- 
ful and useful one, even where it was not exactly true. 

Some useful programs were developed for displaying bivariate 
relationships and for modifying variables systematically . 

Chapter VII expanded the work of the computer pro- 
gramming to analysis of text strings of more than one 
word in length. A phrase lookup algorithm was listed as 
an adjunct to the main program, and was used in a number 
of sub-studies. One of these explored the essays for the 
presence of standard cliche phrases. It did not find them 
in common or injurious use, and where they did occur their 
presence seemed uncorrelated with essay quality . Another 
substudy used the same algorithm to locate phrases be- 
lieved to characterize student traits of opinionation, 
vagueness, or specificity. As predicted, the first two 
were found negatively correlated, the last positively 
correlated, with essay quality, but the significance could 
probably be accounted for by third variables of word common- 
ness which distinguished the lists. Other substudies found 
null relationships between essay quality and correlative 
conjunctions (for one investigator) and verb voice and 
tense (for another) . One significant study also used the 
phrase algorithm to examine parenthetical expressions, and 
found them indeed, as might be predicted, related to essay 
Quality according to whether they occurred at the beginning 
(good) , middle (less good) , and end (perhaps poor) of a 
sentence. Such phrase lookup thus represented a step up- 
wards in the power of the analysis program. 

Finally, Chapter VIII implemented an on-line, inter- 
active progrcim to demonstrate the potential practical 
uses of such a system for eventual classroom applications. 
The program works at a time-sharing console, and is written 
in FORTRAN IV, like the other programs here reported. It 
greets the student and defines the essay assignment. When 
the student has finished his essay and signaled his comple- 



tion, the computer (IEM7094) begins in about 10 seconds 
with diagnosis, evaluation, drill, and advice. The 
algorithms were largely ad hoc and specific to certain 
narrow classes of errors. Much basic work is needed for 
a truly flexible system. Yet the program should help 
demonstrate that there is nothing in principle about the 
computer which will prevent a vast range of essay-analyzing 
applications in the future. 

In short, the chapters up to this point have described 
the actuarial rationale, the deliberate limiting of focus, 
the implementation of computer algorithms , the construc- 
tion of suitable criteria, and the empirical results of 
the current state of the art of automatic essay analysis. 
These chapters have also explored some stat.’ '■■^'ical possi- 
bilities, various additional proxes, and some on-line 
token implementations in simulated settings . The remainder 
of this final chapter will consider certain additional 
possibilities of interest in the work of contemporary 
scholars, and will point some possible directions for the 
most promising future investigation of the lines here be- 



2 . Some Work Related to the Project 



Since the inception of Project Essay Grade, much work 
has gone on in areas related to the project. The investi- 
gators have made additional explorations into related dis- 
ciplines, and have kept constant contact with them. For 
future investigators in automatic essay analysis, some 
knowledge of this outside but related work is essential, 
if they are to avoid the terrible expenses of redundance or 
ignorance. Therefore, this section will briefly describe 
some of this related work. 

Journals . The related disciplines continue to grow 
rapidly in activity. Two journals have appeared which 
capitalize on the potential relevance of computation for 
language processing in traditional scholarship. One of 
these is Computers and the Humanities , since 1966 a quarterly 
edited by Joseph Raben at Queens College. A larger quarterly 
is coming out in early 1968, Computer Studies in the Humani- 
ties and Verbal Behavior, published by Mouton Press with an 
interdisciplinary editorial committee. (The first author 
here is the editorial advisor for education.) And The 
Journal for Educational Data Processing shows interest in 

natural language. 

Societies . Organizationally, a great deal is happening. 
The Association for Educational Data Systems (AEDS) is only 
peripherally interested in natural language, but its involve- 
ment seems to be increasing. The Association for Computing 
Machinery (ACM) , a very vigorous and strong organization of 
computer scientists numbering over 20,000, has a great deal 
of interest in relevant fields. It has a special interest 
committee for artificial intelligence (SICART) , which is 
changing to established group status, and a group for in- 
formation retrieval (SIGIR) . And it has a newly forming 
committee for language analysis and studies in the human- 
ities (SICLASH) which has already a substantial initial 
membership. The American Documentation Institute (ADI) has 
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recently changed its name to the American Society for 
Information Science (ASIS) , and has a keen interest in 
many areas overlapping this project. All of these socie- 
ties put out newsletters, journals, or both. Perhaps the 
most acutely relevant body is the Association for Machine 
Translation and Computational Linguistics (AMTCL) , which 
publishes its own journal and a useful newsletter called 
The Finite String . This group holds its own meetings in 
conjunction with ACM and the Linguistic Society of America, 
and has participated in two international conferences in 
the field. 

The oldest societies within the humanities, such as 
the Modern Language Association (MLA) , are notoriously 
tradition-bound, but even in the MLA a computer group is 
establishing a fairly permanent event at the Annual Meeting. 

Besides AEDS, the educational and behavioral societies 
have indicated a growing interest. The pre-session train- 
ing conferences held before the Annual Meeting of the 
American Educational Research Association (AERA) have been 
stimulating more sophisticated computer strategies for 
some years (with sponsorship from the United States Office 
of Education) . These have been increasingly oriented 
toward interactive work, especially in CAI, which has strong 
ly overlapping interests with natural- language analysis. 

And in 1968 we conducted the first such workshop entirely 
concerned with natural- language analysis for educational 
research. 

Textbooks . A discipline has difficulty in growing 
rapidly until authors have defined it in suituable text- 
books. There are a number of such books which bear on this 
work, thbugh none is currently satisfactory for most courses 
which are being conceived. Works edited by Garvin (1963) 
and by Feigenbaum and Feldman (1964) have been mentioned 
earlier in this report, and so has the older one authored 
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by Oettinger ( 1960 ) on machin© translation. An excGllent 
related work is that by Becker and Hayes ( 1963 ) on infer 
mation storage and retrieval. 

New arrivals include a rather descriptive book in 
the humanities, edited by Bowles ( 1967 ) , and an important 
work in Automatic Language Processing edited by Borko ( 1967 ) . 
One of the most useful works, though not readable by any 
but professionals, is a new book in computational linguis- 
tics by Hays ( 1967 ) . A forthcoming work on computers in 
education, written by Allan B. Ellis, will surely feature 
some natural-language work. And another forthcoming work 
by Gerard Salton on information retrieval (due in 1968 ) 
should be valuable to some workers in natural-language 
analysis. A text by Veldman ( 1967 ) on FORTRAN programming 
for behavioral scientists, has one chapter on verbal data 
which should prove very useful. 

In general, materials suitable for instructing in 
essay analysis can be pieced together from such works as 
these, various programming texts, works on statistics and 
on linguistics. But the field still lacks a suitable 
synthesis textbook for all introductory purposes, and work 
may proceed without it for some" ;time . 

Other books . On the other hand, books which have some 
more distant bearing on natural— language seem to be growing 
rapidly in number and quality, and should receive at least 
brief mention. In theories of automata, the growth has been 
especially brisk. Robert Korfhage ( 1966 ) has produced a 
book which relates computation to recent and current acti- 
vities in mathematical logic, and the production languages 
described have high relevance to context free grammars and, 
indeed, to the basic optimism about what computers may 
accomplish. Marvin Minsky's book ( 1967 ) will surely open 
the field of computation theory to many persons who would 
otherwise not have made contact with it, and should thereby 
produce indirectly much important practical and theoretical 
work. And Taylor Booth (. 1967 ) has unquestionably produced 
the most impressive compendium on automata theory so far. 
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Such activity has been going on before now, but has only 
recently surfaced in such organized forms. In the field 
called "artificial intelligence," we have already seen 
that activity is growing with computer science. Came 
(1965) has one attempted synthesis of some central con- 
cepts, and other, larger works are reportedly in prepara- 
tion. 

At first glance, such works may seem irrelevant to 
natural language processing, but the present writers do 
not believe that they are. Rather, they serve to change 
the way that computers are regarded, altering their image 
from that of a slavish, pedestrian worker to that of a 
universal machine. This seems to us a very important and 
necessary change in .the behavioral applications of computer 
science. 

Recent related work . Earlier portions of this report 
discussed some related work in other disciplines. This 
section will comment on some recent lines of such develop- 
ment, which seem particularly meaningful. This will not 
attempt a complete coverage of such work, but will only 
indicate a few of what may be major lines of related in- 
vestigation, over a longer period. 

We have said that the work of Project Essay has so 
far been actuarial in nature, leaning on statistical rela- 
tions between prox and trin more than on deterministic 
strategies. Such statistical strategies should not be 
underrated. As Sapir has written, "All grammars leak." 

No matter how the future of such work develops, it is hard 
to foresee a time when serious simulation will dispense 
with a large probabilistic element. Yet Project Essay 
wishes to push ahead with the deeper linguistic and psycho- 
logical dimensions as well, and to take maximum advantage 
of any developments in these areas . 
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Parsing . In the linguistic world, there are certain 
lines of investigation which seem very promising. One of 
these is in context-free and other parsing schemes aimed 
at syntactic analysis. Of all parsers constructed, the 
most realistic one so far is the Oettinger-Kuno multiple 
path predictive parser at the Harvard Aiken Laboratory. 

The nature of current parsing systems is described in 
a number of places (Garvin, 1963, pp. 223-232; Bobrow, 1967 
Hays, i967, ch. 6) , and will not be explained here in any 
detail. Since Kuno's program enjoyed such eminence, we 
very interested in possible applications, and Pro 
fessor Kuno kindly arranged for parsing 50 sentences from 
the Wisconsin essays. The results of this processing will 
be briefly set forth, and illustrated. 



Multiple Path System . In order to use the Kuno 
parsing system, every word of the text must be found in a 
"dictionary" — that is, a list of words accompanied by 
their possible syntactic roles, encoded in a way that is 
useful to the system. -The ordinary "noun” or "verb is 
not sufficient; there are various restraints on words 
which are not adequately described by such broad designa- 
tions, and therefore such dictionaries need painstaking 
construction. The Harvard dictionary is still quite limited, 
and some of the common student words needed to be supplied 
(as did all misspellings) . 



Figure IX- 1 shows the result of looking up the words 
of one student sentence in this special dictionary. This 
sentence was; Money becomes a hindranc e when it ceases to 
aid in the attainment of one of the best things and b ecome_s 
a goal itself . Figure IX-1 shows many ambiguities in the 
possible syntactic roles to be played by most of the words 
of this sentence. Only of and ^ presented no homographs, 
and aid possessed seven homographs to compete for the 
correct parsing. 



FIGURE IX-I. 



CC*<PUTiR LISTING OF HOMOGRAPHS FROM 
THE PARSING DICTIONARY FOR A STUDENT SENTENCE 

SENTENCE NUMBER 000024 CORPUS NUMBER 



WORD 

MONEY 

BECOMES 

A 

HINDRANCE 

WHBf 

IT 

CEASES 

TO 

AID 

IN 

THE 

ATTAINMENT 
. OF 
ONE 
OF 
THE 
BEST 
THINGS 

m 

BECOMES 

A 

GOAL 

ITSELF 




HOMOGRAPHS 

NNNS MMMS NOUS 

VI2S VI3S VTIS 

AAA ART 

NNNS MMMS NOUS 

lAV C02 RL6 

TITS PRNS PRC 

VTIS VIIS 

TOIS PRE 

vTip HI VHP in knns mmms 

PRE AV2 

AAA ART 

NNNS MMIjS NOUS 

PRE 

NNNS MOiS NUMS 

PRE 

AAA ART 

NNNC MMMC NOVC AVI AAA ADJ 

NNNP MMMP NOUP 

ICO 

VI2S VI3S VTIS 

AAA ART 

NNNS MMMS NOUS 

PRO AVI 

FRD 
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The next illustration. Figure IX-2, shows the first 
parse performed by the Kuno predictive algorithm. To the 
scholar unfamiliar with such work, this parsing may seem 
a surprising example of artificial intelligen-^e, for there 
is a great deal about it which would correspond with the 
analysis of a trained student of rhetoric. The first 
column is of course the list of "terminal" symbols, i.e., 
the words of the manifest English sentence. The second 
column is the "sentence structure." A little study will 
give some clue to the way this may be read. All words 
fall within the sentence ”1", and we find the number 1 
throughout. The word money , standing as a simple subject, 
is only one syntactic step from the terminal representation, 
and therefore we find only "IS" for structural designation. 

On the other hand, the word a modifies hindrance, and 
hindrance has the structural representation "1C" (where 
C stands for "complement"). Thus the article (or "adjective") 
a carries the designation "ICA". By such dependency rela- 
tionships we have the 12-symbol depth of ^ and b^. These 
words both modify things , which is the object of the pre- 
position of, which leads the prepositional phrase which 
modifies the noun one , and so on back to the adverb clause 
headed by when , which modifies the verb becomes, the second 
word of the sentence. From the second column, one could 
thus draw a tree diagram of the sentence syntax. 

The third column shows the particular syntactic cate- 
gory of that word for this particular parsing. A glance 
back to Figure IX-1 will show that all entries in this 
column appeared as possible homographs in the earlier out- 
put. And- the fourth column is a verbal description of what 
that category, is. The fourth column, then, depends complete- 

ly on the third. 



FIGURE IX-2 



FIRST COhPUTER ANALYSIS OF SYNTAX 
OF A STUDENT SBiTEiNCE 



NOTE: Analysis produced by the Kuno Multiple-Path Syntactic Analyzer 



trSiBHc analysis NUtiBEH 



1 SENTENCE NUi-iBER 000024 



b;glish seatence structure s/tc swc mne:-:onic 



syntactic roll 



corpus number 01 

RL NUTx Pi:EiJlCTION POOL 















S£ 


1 

1 


r.ON?.Y 


IS 


NOUS 


NOUN 1 


SUBJECT OF PiiEDICiiTL VERB 


SENNNO 

FD VSA 


1 

t 


BECC.-.ES 


IV 


VI2S 


ADJ-COMPLEMENT VI 


PktDICATE VERB 


VXVI21 

PD VSAZEJJN3A 


f 

i 


A 


1CA 


ART 


PRO-ADJECTIVE 


CCMPLEKEKT OF PREDICATE V 


N3AAA0 

PD VSA2MNN6A 


V 

5 

rS 

1 


HIKDHAIiCE 


1C 


NOUS 


NOUN 1 


COI-PLEMHT OF PP^ICATE V 


N6KMK0 

PD VSAZMN 




WRUi 


18R 


C02 


ADVERB CONJ 1 


CaWUNCTION 


CMC022 

PL VSACMNVZG1ZA 


1 


IT 


IBS 


PRNS 


PERSONAL FRN NOM 
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The fifth coliimn, however, depends also on the actual 
sentence structure as diagnosed by the coinputer program. 

That is, it depends on what rules of the context-free 
grammar, what permissible grammatical constructions, were 
employed in order to yield this successful parsing of the 
sentence. And the final columns have to do with the way 
the parsing is carried out, with the push-down store opera- 
ting at each step of the way. 

A continuing analysis of this parsing will, unfortunately , 
show that it does not completely match one's intuitive 
analysis of the later portions of the sentence. The second 
column indicates that becomes (fourth word from the end) is 
taken to be parallel with becomes (second word of the sen- 
tence) . That is , it is taken to be part of a compound pred- 
icate of the word money . But most of us would take this 
word to be part of a compound predicate of the word it 
(sixth word in the sentence) . The distinction, from the 
standpoint of "meaning," is not a trivial one at all. The 
way this parsing "reads" the sentence is (in reduced form) : 
Money becomes a hindrance . . . and becomes a goal itself . 
Whether the distinction would be important or trivial for 
a particular analysis would, however, depend on the empiri- 
cal situation. 

A variant parsing appears in the next illustration. 
Figure IX-3. This was the twenty-fourth "successful" 
parsing of this sentence, and shows a number of changes 
from the first one. We see that becomes (fourth from the 
end) is here diagnosed as parallel with ceases , as it 
should be, and therefore is part of the compound predicate 
of it. (There is a rather subtle change in another way 
here, however v in the diagnosis of role of the infinitive 
to aid.) 
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Parsing went on and on, until there were 10^ parsings 
of this sentence alone (a very high number in the present 
trials) . The system has no way of automatically picking 
the "right" parsing from among the competitors. The know- 
ledge about the world and about language habit which in- 
forms our own analysis has no present analogue in the 
serious and large-scale parsing programs. This is just 
the trouble with the present parsing systems, and with 
present linguistic knowledge, as was pointed out in an in- 
vited address by Anthony Oettinger to the 1967 Meeting 
of the American Documentation Institute. 

But incomplete as our knowledge is , such analysxs 
may still have much diagnostic interest and value. A great 
many branches of the parsing tree are pursued in such 
attempts, and information from these searches may have 
statistical value. Figures IX-4 and IX-5 show some of the 
statistical information which is produced by the Kuno 
algorithm. Such information may be useful for diagnosis 
of student errors, but an explanation of this possibility 
would take more space here than would be appropriate. 

There may also be actuarial value in the ability of 
the program to parse any given sentence. The 50 student 
sentences were analyzed independently by an English scholar 
(Michael J. Zieky) as well as by the Kuno program, and the 
resulting two-way contingency layout is shown in Table IX-1 
In this table the columns represent the human judgements of 
the 50 sentences, whether they were believed "grammatical" 
or "not grammatical." We see that 29 were grammatical, and 
21 not so. On the other hand; the rows represent the 
ability of the program to find a successful parse for each 
of the 50 sentences. We find here that there were 29 
successfully parsed, and 21 for which no parse was found. 

We find a very clear relation between the rows and the 
columns of this table. In fact, if these sentences might 
be assumed to be independent of one another, the resulting 
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TABLE IX-1 



THE RLLnTION OF COtiPUTER PARSING TC 
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Chi square =* 15*04 (p<.001)* 

Contingency coefficient ** .48 

■♦^Data were not independent. See discussion in text. 



chi square of 15.04 would be significant beyond the .001 
level of confidence. And the related contingency coeffi- 
cient would be a healthy .48. In other words, the ability 
of the program to parse a sentence would have some predic- 
tive power for whether the sentence would be judged gram- 
matical by an expert human. Because of the casual way 
these sentences were drawn for computer analysis, the 
assumption of independence is not warranted; but the general 
trend of the results still suggests actuarial value in the 
use of such algorithms for computer analysis of essays. 

The data from the comparison are presented in a 
different way in Table IX-2. Here we are able to review 
the computer analysis of the sentences. Ideally, of course, 
every sentence should produce only one parse, and that one 
should be the same as that of an expert human. Nevertheless, 
it' is important that those sentences which were grammatical 
had, on the average, many more completed parsings than 
those which were not grammatical. And it is interesting 
that the median number of parsings for grammatical sen- 
tences was 3, but 0 for the ungrammatical ones. 

It is also interesting to observe, in TAble IX-2, the 
order in which the correct parsings occurred. Only 16 
parsings were judged as intuitively faultless. Seven of 
these occurred on the 1st trial, 6 on the 2nd, and the 
others as shown. The present Kuno program, outstanding as 
it is, has made no provision for statistical optimization, 
and this performance should be improvable in some appro- 
priate adaptation. 

In order to have a similar parser for experimental 
purposes, we have undertaken to make a PL/I version of the 
predictive parser, programmed for the Project by Gerald 
Fisher, and listed in Appendix D. Appendix D also has the 
flowchart of that parser, which may help the reader new to 
such strategies to understand their nature. This program. 
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called PARSE, has been debugged and tested with artificial 
information, but not yet with natural language. Such a 
parsing program is only the vehicle; the content must be 
furnished by (1) a suitable dictionary; and (2) a suitable 
set of grammatical rules. This brief chapter is not the 
place to set down all the considerations which will play 
a role in any further development of such linguistic pro- 
cessors , but a few points will be suggested by the next 
sections . 

Discourse analysis- . Further pondering of the sentence 
parsing will reveal some difficulties not considered by the 
multiple path analyzer. If one is concerned about "meaning" 
and about how a machine "reads" a sentence, then one must 
arrange for the prose of an essay to hang together, in some 
sort of cognitive net. The token sentence of Figure IX-2 
will illustrate this problem. No provision is made for 
the analysis of antecedents or referents: the pronoun 

it is not tied in any mechanical way to the word money 
which it presumably renames. But pronouns are not the only 
offenders in such a simplified analysis. In most prose, 
such as scientific writing, a large proportion of the nouns 
refer in some abbreviated way to persons, objects, or ideas 
which have already been treated in the writing. The human 
reader at once connects these new expressions with those 
which have gone before, but how this is accomplished is not 
yet understood very clearly. 

J. Olney and D. Londe, of the System Development 
Corporation, are among the very few who have giver.\ compu- 
tational attention to this problem, and their brief writ- 
ings are not yet ready for any broad dissemination (personal 
communications) . There are clearly some explicit cues 
which may be helpful (such as number, gender, person). 

There are synonym relationships also, some of which may be 
discovered through mechanical use of a large dictionary. 
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There are also questions of proximity; other things being 
equal- one would expect the most recent candidate for refer- 
ent to be operative. Standard techniques of optimization 
may weight such criteria appropriately and may make a best- 
guess selection of reference for pronouns or other anaphoric 
expressions. A great deal of work is necessary/ then/ in 
this field of discourse analysis. 

Transformational grammar . Of course/ one of the most 
active areas for current linguistic research is in trans- 
formational grammar. Treatments of this topic may be found 
in a number of references (e.g./ HayS/ 1967/ ch. 8). Per- 
haps the best recent treatment of the topic / especially 
from the viewpoint of computation/ is by Keyser and Patrick 
(in press ) , both of whom have served as consultants for 
Project Essay. Perhaps the most useful program for trans- 
formational analysis is that described in Patrick's thesis 
(1965) . 

John Moyne and David Loveman/' at the IBM Boston Pro- 
gramming Center/ have programmed a very limited system 
which carries analysis through a syntactic analysis to a 
transformational analysis/ and prints out appropriate 
answers to questions . Like all such extant systems , this 
one is for a special purpose, in this case document re- 
trieval from a large library . And they have processed a 
few student sentences, from Project Essay, on an experi- 
mental basis, through their first, surface-structure parser. 

Semantics . In general, transformational grammars are 
far from any linguistic perfection, and face deep problems 
which will not be described here. Yet there are approaches 
to the question of meaning which have some demonstrable 
usefulness and power, and which may sidestep these deepest 
problems for the purposes of application. Some of these 
are generally described as involved with "semantics, and 



some are framed within the practical problem of question- 
answering systems. Still others are spoken of in terms of 
information storage and retrieval, especially what is 
spoken of as "fact retrieval," as contrasted with "document 
retrieval" . 

These works share a common concern with the way that 
information may be read into some data representation in 
the computer, and how it may then be made accessible for 
further use. William A. Woods (1967) , for example, took 
for granted the output from some syntactic and transforma- 
tional parsing system, and then asked how he could develop 
a question- answering system. His particular corpus was 
flight information from the Airlines Guide , and he worked 
out operators for logical comparison and other semantic 
concerns which would implement such a system. In doing so, 
he built upon earlier work with BASEBALL (see Feigenbaum 
and Feldman, 1963, Sec. 5) , and similar systems, but went 
beyond his predecessors in certain important ways. Other 
new work is that of Quillian (1966) , who has provided a way 
of storing semantic relationships. His structures permit 
comparison between two statements, and make possible judg- 
ments about them concerning their agreement, disagreement, 
or irrelevance. 

The importance of symbolic logic in such systems is 
apparent in the recent work by Levien and Maron (1967) . 

These authors use the predicate calculus, with binary rela- 
tions only, as a universal tool for fact storage. They 
organize a data base which has four different ways of random 
access (corresponding to sentence number, relation name, 
and the two elements) for rapid retrieval of the fact through 
any of its components. ' Their method is wasteful of storage 
space, but extremely rapid in operation, able to locate any 
fact without poring through lists. Their system thus enjoys 
some important virtues of the psychological models. 
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There is much work going on, then, in fields with 
important relevance for the future of essay analysis. It 
takes the form of progress in linguistics , psychology , and 
computer science, and elements of statistics and logic 
have a bearing as well. Surely, Project Essay must main- 
tain its close contacts with these fields in relation to 
its future work . 



I 
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3 . . Future Work in Essay Analysis 



Need for Flexibility . The sub-discipline of computer 
analysis is only now beginning to take shape. In the mean- 
time, as we have seen, work in the area seems to call for 
a rather unusual approach; interdisciplinary, broad in 
purpose, and flexible. 

In its present development, the computer analysis of 
essays does not yet lend itself to the clear, Fisherian, 
"classical" experimental designs, because not all opera- 
tions can be foreseen. It does, however, permit clear pro- 
cedures of dynamic development and exploration at each 
stage of the study, and verification of accomplishment at 
the end. Properly understood, these characteristics are 
not handicaps, but symptoms of large research scale. In a 
recent paper. Baker (1965) pointed out that the larger and 
more exploratory research project "must be inherently dynamic 
and possess the ability to change its internal structure 
without sacrificing the rigor of the design" (p. 15) . And 
another writer (Doyle, 1965) has recently stated that as a 
study approaches the "basic research end of the spectrum, 
it becomes more and more imperative to be free to alter the 
plan. Indeed, in basic research altering the plan ought 
to be a state of mind." With the present study, it would be 
mistaken and even misleading to commit the investigation 
prematurely to too narrow a path. 

The first phases of this study illustrate this point. 

In the earlier work, only the most general goal then, as 
now, was completely operational, foreseeable, and attainable; 
the maximization of the correlation between computer-analyzed 
prose characteristics and the human judgments of the prose. 
The earlier work has reached'this goal (so far as possible 
during the time permitted) , but many paths were altered along 
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the way. Prograituning plans were modified and improved. 
Certain hypotheses were reformulated, and others discarded. 

At the conclusion of the first phases, the progress has 
been much greater than if the inevitable misconceptions of 
the beginning had been adhered to in spite of everything. 

The ultimate goal, however, was rigorously adhered to, and 
the most careful investigatory techniques employed at each 
decision point along the way. In the newer research designs, 
what must be done, rather than to make all of the decisions 
before the choice points are reached, is to illustrate the 
quality of decision-making. This portion of the proposal, 
and that which follows, are intended to state the general 
objectives and the decision-making strategy by which these 
goals will be attained. 

As noted before,. the work reported here has already 
identified useful computer-analyzable indicators of student 
writing skill, and has demonstrated the potential feasi- 
bility of overall theme evaluation by computers. When 
holistic grades are desired, or ratings of important essay 
traits, the PEG computer program already assigns marks as 
accurately (measu'd against the criterion of multiple ex- 
pert judgments) as the individual, trained. English teacher. 
Future work should expand the work to the analysis and eval- 
uation of content, and deepen it linguistically and psycho- 
logically by investigation of more humanoid processes. 

Some general future objectives may be outlined; 

1. To expand consideration to essay content as well 

as style. 

2. To explore the relation of dictionary strategies 
to successful analysis, and to develop optimum strategies 

for the Random House Dictionary tape. 

3 . To analyze computer-generated data in relation to 
subjective measures of content and style in the early 
secondary years, to increase usefulness of analysis. 
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4. To improve the programming of on-line correction 
of essays, and on-line feedback to the student or teacher. 

5. To identify future strategies for deeper explora- 
tion of this new field of educational technology. 

Grading of content . Just as we have opened up the 
possibility of grading the esthetic traits of an essay in 
English, so we should also be interested in the possibility 
of judging the substantive content of essay material, apart 
from the general writing ability of the student. This is 
a dimension of essay analysis not yet attempted within this 
project, yet it may be approached at a number of different 
levels of sophistication, and some of these might prove 
both economical and rewarding. Let us consider a sample 
problem in American history, to conceptualize these various 
levels, first heuristically , and finally in more hypothetical 

but technical detail. 

Suppose we wished to grade children on the factual 
content of an essay about the discovery of America. It 
might be supposed that certain words or phrases should appear 
in the more complete essays: Columbus. , Christop.^, Fer^- 

nand, Isabella , king , gueen, Spain, hzoxes, 1492, Nina^ 

Pinta, Santa Maria , Indians , etc. These words and others 
could be fed into core as a kind of dictionary, much as has 
been done already with such lists as prepostions , misspel- 
lings , common words , etc . Each first use of any of these 
Columbus expressions could be scored in some fashion. No 
doubt such scores would be positively correlated with 
“factual completeness" ratings as assigned by human judges. 
Such scoring would therefore be an aid in achieving the 
simulation sought for in Quadrant I. A of Figure II-l. 

Suppose we asked for meaningful relationship^ among 
these and other words. One evidence of such a relationship 
might be to have the word Isabella occur in the same sente 
as the phrase queen of Spain . And such use within the same 





sentence should perhaps receive a higher score than use 
in different sentences. Again, the consideration is actuarial, 
yet now the statistical analysis is one small step closer to 
a meaningful relationship between ideas. 

Of course, at a somewhat higher level, we would not 
wish too high a premium placed on arbitrary words, so we 
might include monarch or sovereign in core storage as 
acceptable equivalents of queen , or IsabeJ. as an acceptable 
form of Isabella. Synonyms could possibly be scored quite 
sensitively, according to their judged "semantic distance 
from the most desired words. 

Or we could look further for meaning, by asking that, 
within the sentence, Isabella and queen (or equivalents) 
be in some standard form suggesting identity. Some common 
ways this might be dohe are as a title (Queen Isabella) , 
as an appositive ( Isabella , queen ... or gueen .... 
Isabella) , or as a predicate nominative (Isabella ... 

[form "of to be] . . . queen , or inverse) . And such evidence 

of identify could be scored somewhat more highly than the 
appearance together without such evidence. 

Now consider a much more advanced system. Note that 
if we have a sufficiently sophisticated general dictionary 
available, and an adequate general sentence analyzer, we 
will not need to anticipate each specific equivalent ex- 
pression or relationship in each specific essay examination, 
in order to score it. We can instead read in a 1^ in the 
form of English sentences containing some model narrative 
about Isabella and Columbus. Various equivalences would 
then be potentially available for the grading of the stu- 
dent's "own words." But here we are clearly in the I.B 
Quadrant of Figure II-l, and are doing a kind of "master 

analysis. " 
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The progress here is from the employment of a simple 
lexicon of key v/ords, to the acceptance of their synonyn^/ 
to the search for the key words or synonyms in appropriate 
contexts , to the search for these meanings in appropriate 
relationships . 

For any ulti't.ate, applied evaluation of essay content, 
a computer program should be no more elaborate than neces- 
sary for the overall goal. If it is much cheaper to use 
the lower, lexical strategy, and if it is almost as accurate, 
then it would only waste machine time to compute higher- 
level information which will not be used. On the other 
hand, in some essays the special vocabulary may not be very 
important, and other factors may control the evaluation of 
merit. The Columbus example seems to depend highly on 
vocabulary, but there, may be others in which all students 
use essentially the same special vocabulary, and the dis- 
criminations are at the higher contextual and relational 

levels. 

One early discovery needed, then, is the degree to which 
most school essay evaluation is dictionary- loaded. And some 
workers are addressing themselves to this need, with some 
college level examinations , at the time of writing . 

Another purpose is to seek more advanced strategies of 
semantic analysis, of the contextual or relational sort. 

These strategies have some antecedents as well. Most tech- 
niques of informative retrieval, for example, are based upon 
co-occurrences (cf. pp. 310-353 in Garvin, 1963). And the 
usual employment of the General Inquirer system employs 
such contextual techniques (Stone, 1966) . As we have said, 
still more advanced systems of relational semantic analysis 
have been programmed by such workers as Woods (1967), or 
John Moyne (of IBM's Boston Programming Center). An impres- 
sive attack on the problem of artificial memory appropriate 

to such relationships has been made by M. Ross Quillian 
(1966) . These workers have already consulted informally 
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with this project, and will be available for further work, 
and some have already done some pilot tasks. 

Further analysis of style . Surely workers should not 
abandon the analysis of esthetic quality in writing, but 
rather use available advanced strategies of meaning to 
further such judgment. To some extent, the analysis of 
style must be paired with meaning, and it is hoped that 
the next years will see advances in the description of 
surface structure, and at least some preliminary considera- 
tion of stylistic traits such as synonym, contrast, and 
parallelism, all of which have a large semantic content. 

On a tentative basis, as we have described, some high 
school essays have been already analyzed by parsing programs , 
one written in FAP for the IBM 7094 by Susumu Kuno (1964) at 
Harvard, and the other in PL/1 "ELF" by David Loveman and 
John Moyne at the Boston Programming Center. These have 
indicated that partial parsing is already available, but 
that further adaptation of any parser will be necessary. To 
some extent, the output of a parser will be used to inform 
the semantic analysis. During the next years of. such work 
parsing will be carried much further than at present. 

Hypothetical compl ete essay analyz^. Some reasonable 
future objectives of workers have been stated above. These 
are realizable and useful objectives, and can probably be 
obtained within reasonable limits of time and effort. 
Nevertheless, it is informative to construct a more distan 
objective, which would be a set of computer routines tied 
together in a more complete and humanoid essay analyzer. 

Anticipated future strategies are currently summarize 
in Figure IX-6. This figure is based partly on work alrea y 
accomplished, partly on suggested minor adaptations of , 
systems already working for others, and partly on projected 
programs which are not yet operative in any system, but which 
do not seem impossibly difficult at the efficiency desired. 
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figure IX-6 



HYPOTHETICAL COMPLETE ESSAY ANALYZER 



1. INPUT and PUNCH. Handwritten or typewritten or other raw reaponae 
of the writer is converted for computer input# 

2 SNT(®G. Creates arrays of words and sentences as found in i^ose. 

This is just as performed in PEG, or by a PL/I version called 

SCORTXT. 

3# DICT. Assignment of available syntactic roles to each wo^ „ 

is currently done by many programs, but needs an expanded diction 
arv. and ambiguity resolver. At the same time, the semantic 
information will be stored in the work-space for reference of 
other parts of program. The tape^itten Randcm House Dictionary 
(Unabridged) is a very valuable facility for this work. 

4. PARS. A modified Kuno (1964) program such as P^E 

ising, and the skeleton is now available in PL/I. Alterations 
will be necessary to accept well-formed substrings, and to work 
out dictionaries and grammars of appropriate power. 

5 REFER. This is intended to identify and encode the most 
^fe4ts of pronov®. «ul other «,aphoric 

cess must employ both syntactic foatvirea and probably sanantic 
information from DICT or other sources. 

6. miffiL and STRUC. Erco the rwltten string wt^t 

extablish a set of elementary propositions, ^ STRUC 

encode the relationships among these •I®*?*"; e*SnJ^ 

retain the infoimation of an essay in sim^st 
yet would retain additional information about emjphasis, subordi- 
nation, causal relation, etc., among these units. 

7 TOUIV The elementary units would be augaented by the semantic 

TO each word would be 

permissible synonyms, with weightings of ^ 

Mrmits an analysis of redundance and 

permits a comparison of the content of the student es y 

that of the key or master essay. 

8. STYLE. Descriptione of the surface “^ructure charact^st^ 

eseays part, of speech, organisation of themes, t^ 

ties of sentence etructure, greimatical depths, ti^toesso^fer- 
ence, etc.s information about grasmatical errors and strengthe. 

9. CONTNT. Comparison of the agreement of student 

through measure of kernel hits and struc hits, these weighted 

by smnantic distance of language chosen. 

10. SCOR. Ihiltivariate pre<M.ction of appropriate profile for the 

iomediate purpose. — 



The limitations of space will permit only a few comments 
on this figure. For large grading systems, over established 
substantive content, it would be possible, for th e key^ or 
master essay , to edit by hand the output from certain rou 
tines (especially REFER and STRUC) . Of course, four of the 
most important routines listed in Figure IX-6 are far from 
perfected in any existing programs. Ideally, they would 
assume better solutions to certain major, stubborn problems 
in computational linguistics. 

Indeed, certain steps in this hypothetical essay grader 
are close to the heart of some of the most persistent and 
troublesome problems in linguistics. • Is it necessary that 
sentences be syntactically analyzed before mapping into 
deep structure? What is the proper role of semantics in 
such deep structure? . How can the outride knowledge of the 
reader be incorporated into the machin^. analysis? In gen- 
eral, how may we incorporate some of th^ intuitive richness 

which the literate human brings to his reading? . 

\ 

Surely, in essay analysis workers will not suddenly re- 
solve all such questions. These questions so"~ferQu^e 
linguists as to contribute to the recent official pe^^imism, 
in the United States, about the future of mechanical 
lation. After 15 years of effort, mechanical translatibn • ; 
is still regarded as disappointing in quality, and virtually 
no sustained output of any machine program would be ordin- 
arily mistaken for the work of a professional human trans- 
lator. 

On the other hand, the earliest attempts at essay grad- 
ing by computer have, in a very limited way, leaped ahead 
of machine translation. And if the expert human ratings- of ' 
high school essays may be regarded as an acceptable goal, 
then the machine program appears to have reached such a goal 
already. For that matter, improved performance, even superior 
to that of the individual human expert, appears to be imme- 
diately practicable as well. 
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The explanation of this advantage, of course, is that 
the problem of essay grading as attacked in the current work 
is much easier than the problem of machine translation. In 
translation, every nuance of the input string should be 
accounted for in the output string. In essay grading, only 
a certain portion of the input text needs to be accounted 
for, and the output does not depend on the existence of any 
large language-generating system. High quality machine 
-translation apparently demands a fair portion of the total 
language-manipulating capability of the human, but essay 
crrading may use only a fraction of it, and may process 
language in ways quite different from that of the h\iman being. 
For example, our present programs have to date largely ignored 
order and sequence in the essays, although to the human the 
order of words is, of course, of crucial and unceasing im- 
portance. 

Since essay grading can work with such fractional 
information, then, why pursue the deeper analysis, of Figure 
IX-6? Clearly, the purpose is not entirely the same as it 
would be for the usual linguist. At any discrete time in 
research, what is sought is not necessarily the perfect 
humanoid behavior, but rather those portions of that be~ 
havior which, given any current state of the art, will con- 
tribute optimally to efficient and practicable improvements 
in output. Indeed, regardless of the eventual perfection 
of deep linguistic behavior, for any specific application to 
essay analysis, at any one moment, large portions of such 
available behavior may be irrelevant, just as it seems that 
ordinary human language processing does not usually call 
for our full linguistic effort. 

Yet we regard it as eventually important to be abl^ to 
p 0 ]^form these various kinds of advanced machine analysis 
when, required. Therefore, the eventual uses of tht ideal 
essay analyzer may require analytic capability as deep as 
may be imagined. Writing out suitable comments for the 
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student, for example, will in some cases tax any system 
which may be foreseen. 



Even approximate solutions to these problems, however, 
though unsatisfactory for certain scientific purposes, 
could make important contributions to the educational des- 
cription and evaluation of essays. For such evaluation is 
itself probabilistic, limited by imperfect asymptotes of 
writer consistency and rater agreement. And such evalua- 
tion therefore does not require, to be practicable and 
satisfactory, a deterministic perfection. There is a funda- 
mental difference in goals which must be realized. As has 
been demonstrated here, the output from much cruder statis- 
tical programs has already reached a quality not too remote 
from usefulness. The more advanced strategies currently 
seem, at least to the present workers, bright with promise, 
for an ultimate target of such analysis, subject to altera- 
tion and amendment as more is learned about the nature of 
essays and about the evaluative process • 






I 



§ 






§ 



In conclusion, this section on the future has aimed, 
first, at explaining the special nature of objectives in a 
new, exploratory, and developmental research; second, at 
briefly listing concise and obtainable objectives; third, 
at explaining appropriate goals in the evaluation of sub- 
ject-matter content and in the appropriate use of 
dictionaries; fourth, at explaining the relation between 
objectives of stylistic analysis and objectives of subject- 
matter; and fifth, at setting forth ultimate objectives in 
a humanoid, hypothetical analyzer which, while it will never 
be completely realized, will be a target for the accomplish- 
ment of the immediate future. Surely, the computer analysis 
of language will become a permanent feature of the educa- 
tional scene. 
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APPENDIX A 
FORTRAN SOURCE LIST 
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c 

c 

"c 
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c 

c 

c 



c 

c 



1966-67t 



THIS IS THE COMPLETE SOURCE PROGRAM FOR E^SSAY ANALYSISt 
SUPPORTED BY THE COLLEGE ENTRANCE EXAMINAflON BOARD, 

AND MORE BY THE UNITED STATES OFFICE OF EDUCATION 

AND CARRIED OUT AT THE BUREAU OF EDUCATIONAL RESEARCH AT THE UNIVERSITY 
OF CONNECTICUT, IN ST0RR5, WHERE ELLIS B. PAGE WAS DIRECTOR OF THE 

PRINCIPAL PROGRAMMERS. 




PROJECT, AND MR. AND MRS. GERALD FISHER WERE 



IN THE EARLY WORK, HIGH SCHOOL ESSAYS 'WERE' ANALYZED BY 
COMPUTER FOR 30 VARIABLES. THESE VARIABLES ARE 

TRANSFORMED BY THE SAME PROGRAM* TO APPROPRIATE "SCALES (USUALLY 

RATIOS). THEN MULTIPLE REGRESSION MAY BE PERFORMED TO PREDICT THE 



POOLED HUMAN JUDGMENTS OF THESE ESSAYS. THE PRESENT PROGRAM 
DOES THE CENTRAL TASKS OF SENTENCE ORGANIZATION AND WORD LOOKUP 
THAT ARE IMPORTANT IN ALMOST* ANY NAT URAL -LANGUAGE ANALYSIS. 



ADAPTIONS OF THIS PROGRAM MAY BE MADE RATHER EASILY. 

INQUIRIES MAY BE ADDRESSED TO DR. PAGE AT THE BUREAU OF EDUCATIONAL 



RESEARCH, UNIVERSITY OF CONNECTICUT. 



THIS IS the MAIN PROGRAM, CONTAINING 
THE GROSS LOGIC FOR PROCESSING THE ESSAYS. 

THE BEGINNING STAtEMENTS OF EACH PROGRAM SPECIFY THE INTERRELATION- 
SHIPS AMONG THE PARTS OF STORAGE AND WORD TYPES AND WORD NAMES. 



THESE STATEMENTS ALLOW FLEXIBILITY IN WORD HANDLING, COMPRESSION 
IN PROGRAMMING, AND THE NECESSARY COMMUNICATION LINKS BETWEEN 
THE INDIVIDUAL SUBPROGRAMS. 

THE. WORDS, THEIR CONTENTS, THEIR ALTERNATIVE NAMES AND THEIR 
LOCATIONS IN STORAGE ARE DEFINED ON AN ACCOMPANYING ALPHABETIZED 
LIST. 



BESIDES THE TABLES OF CHECKLIST WORDS, INPUT CONSISTS OF ESSAYS 
WHICH ARE PUNCHED ONE LINE PER CARD, USING UP TO 80 COLUMNS OF THE 
CARD. 

THE FIRST CARD OF EACH ESSAY IS PRECEDED BY AN IDENTIFICATION CARD 



WHICH CONTAINS THE IDENTIFICATION NUMBER OF THIS ESSAY IN COLUMNS 
AND THE TITLE INDICATOR (WHICH IS BLANK IF NO TITLE IS PRESENT). 
FOLLOWING THE LASt CARD OF EACH ESSAY IS THE END CARO WHICH 
CONTAn^_AN ASTERISK IN COLUMN 1, AND A BLANK IN C:OLUMN 2. 

FOLLOWING "'the' LAST 'eND CARD IS THE END OF JOB CARD WHICH 
CONTAINS 99999 IN COLUMNS 1-5. 



1-5 



AN 



THE OUTPUT 
ADDITIONAL 



ORDER 

THESE 



THE CONTENTS 
CONTENTS ARE 



CONSISTS" OF PRINTED LINES , ‘ ONE FOR "EACH SENTENCE, AND 
ONE FOR EACH ESSAY CONTAINING IN A RRAY 

■ ARRAY 
THE 



OF THE SUMS 
DESCRIBED ON 



AND THE TOT 
ALPHABETIZED 



ARRAY. 

LIST. 



C 

C 

"C 



FOLLOWING IS THE SET OF TRANSFORMATIONS, 
THE REGRESSION ANALYSIS MAY BE RUN. 



FROM WHICH 



tHE" SUMMARY ESSAY DATA ARE ALSO PUNCHED IN CARDS FOR IMMEDIATE USE. 



WE WRITE THE PRINTED INFORMATION ON TAPE UNIT 0 
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APPENDIX A (Continued) 

_ .. j! 


i WE WRITE 


YHE TOTAL 


INFORMATION ON TAPE UNIT A j ; 

i 


i READ THE WORD LISTS 


AGAINST WHICH PARTS OF 


J 

EACH ESSAY WILL BE CHECKED 


i NAME 


COMMON 


TYPE 


ARRAY NAME 


CONTENTS i 


t ^ 


CHAR 


REAL 


ALPHA? 1< 


ABBBBB 1 


f. APOSTR 


CHAR 


REAL 


PUNCT? 6< 


BBBBB 1 


1. b’ 


CHAR 


REAL 


ALPHA? 2< 


BBBBBB 1 


1 BLANK 


CHAR 


REAL 


PUNCT? 1< 


BBBBBB J 


i ■ " ■ BROKUP 


IN 


REAL 


BRO'KUP 


80 WORD CARD IMAGE 1 COLUMN PER W 


1 c 


CHAR 


REAL 


ALPHA? 3< 


CBBBBB 1 


E CONN EC 


LI StS 


"TroOBUE 


RDTBL?162< 


30 CONNECTIVES^ ^ 


1 COMMA 


CHAR 


REAL 


PUNCT? A< 


, BBBBB 1 ^ 


1 COLON' 


CHAR 


"REAL 


PUNCT? IK 


..BBS¥ 1 


E CPAREN 


CHAR 


REAL 


PUNCT? 8< 


<BBB8B i 


i D 


CHAR 


REAL 


■ALPfi'A'SE ■ 


DBBBBB ' * i 


1 DALE 


LI STS 


DOUBLE 


R0TBL?222< 


3000 DALE LIST WORDS | 


E oMh 


— um~ ~ 


ft EAT " 


PUNCT?l5< 


— BBBB 1 


1 OECLAB 


LI STS 


DOUBLE 


ROTBL?10222< 10 WORDS TO IDENT DECLA B | 


1 DECPf 


CHAR 


REAL 


PUNCT? ‘3'<: ■ 


■. bbB'bb 1 




CHAR 


real 


ALPHA? 5< 


EBBBBB 1 


1 ENDPCf 


OUT 


INTEGER 


"'SUM$?2'8‘< 


THDr'" FOR"" PUNCT' AT END- OF SE'NTENC I 


^ EXCLAM 


CHAR 


REAL 


PUNCT?13< 


•X6BBB 1 


1 F 


CHAR 


REAL 


alpha? 5< 


FBBB8B 1 1 


!; G 


CHAR 


REAL 


ALPHA? 7< 


gbbbbb 1 


^ ■ H 


CHAR 


real' " 


ALPHA?" 8< 


'HBB'BBB " 1 


: HLFTXT 


IN 


REAL 


REAL STORAGE OF CURRENT SENTENCE | 


E HYPHEN 


'CHAR" 


REAL 


PUNCT? "K 


-BBBBB 1 


P I 


CHAR 


REAL 


ALPHA? 9< 


IBBSBB 1 


E ID 


OUT 


INTEGER 


SUMS? 1< 


IDENT NO THIS ESSAY j | 


fe ITALIC 


CHAR 


REAL 


PUNCT?16< 


?/<BBB J 1 


fe J 


CHAR 


REAL 


ALPHA?10< 


J8B8BB j 


P K 


CHAR 


REAL 


ALPHA?11< 


KBBBBB -I 




CHAR 


REAL 


AL'P'HA?! 2< 


L'BBBBB '■ ■■ ■ 1 


fc LENGTH 


IN 


INTEGER 


LENGTH 


1-12 FOR WO LENGTH 99 FOR PUNCT T 1 


k M 


CHAR 


REAL 


ALPHA?13< 


MBBBBB 1 


[C N 


CHAR 


REAL 


ALPHA?14< 


NBBBBB 1 


k NAPOS 


OUT 


INTEGER 


sums?Ti< 


NO OF APOSTROPHES THIS SENTECNE J 


k NCOMMA 


OUT 


INTEGER 


SUMS?12< 


NUMBET OF COMMAS THIS SENTENCE 1 


E ' NCOLON 


" 'out 


INTEGER 


SUM‘S?17< 


NO OF COLONS THIS SENTENCE ' J 


t NCONN 


OUT 


INTEGER 


SUMS?23< 


NO OF CONNECTIVES THIS SENTENCE 1 


^ NDASH 


OUT 


INTEGER 


SUMS?16< 


NO OF DASHES THIS SENTENCE ji 

NO OF DALE WORDS HTIS SENTENCE 1 


iC NDALE 


OUT 


INTEGER 


SUMS?27< 


fe ' ' NEXCLA 


OUT 


INTEGER 


SUMS?20< 


NO OF EXCLAMATION PTS THIS SENCEN 1 


[G NPAREN 


OUT 


INTEGER 


SUMS?10< 


NO OF PARENTESES. THIS SENTENCE | 


fc NPER 


OUT 


INTEGER 


'SUMS?13< 


NO OF PERIODS THIS SENTENCE | 


|C NPERCT 


OUT 


INTEGER 


SUMS?1A< 


NO OF PERENCT SIGNS THIS SENTENCE | 


[G NPREP 


OUT 


INTEGER 


SUMS?22< 


NO OF PREPOSITIONS THIS SENTENCE 1 


it NQUOTE 


OUT 


INTEGER 


SUMS?19< 


NO OF QUOTES THIS SENTENCE 1 


t N QUE S " 


■ ' oOf 


INTEGER 


SUMS?21< 


NO OF QUESTION MARKS THIS SENTENC I 


fC NRELPR 


OUT 


INTEGER 


SUMS?25< 


NO OF RELATIVE PRONOUNS THIS SENT* 


1 n'semic 


but 


INTEGER 


SUMS?18< 


NO OF SEMiCOLONS THIS SENTENCES ■ 


fe NSPELL 


OUT 


INTEGER 


SUMS?24< 


NO OF SPELLING ERRORS THIS SENTEN ■ 


g NSCONJ 

9 


OUT 


INTEGER 


SUMS?26< 


NO OF SUBORDINATING CONJUNCTIONS ■ 


S 
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APPENDIX A (Continued) 


1 


1 


NUMSEN 


OUT 


INTEGER 


TOT? 3< 


NO OF SENTENCES THIS ESSAY 




t 

£ 

■r 


NUHPAR 

NUMWDS 

NUNDER 

NWDSQ 


OUT 

OUT 

OUT 

OUT 


INTEGER 

INTEGER 

INTEGER 

INTEGER 


TOT? 4< 
SUMS? 8< 
■SUMS?f5"< 
SUMS? 9< 


NO OF PARA THIS ESSAY 

NO OF WORDS THIS SENTENCE 

NO OF"‘lfALICiZED WOR!)S THIS SENTE 
SQ OF NO OF WOS THIS SENTECE 




i 

% 


0 

OPAREN 


CHAR 

CHAR 


REAL 

REAL 


ALPHA?15< 
PUNCT? 7< 


'oFb’bbb 

?BBBBB - 


j 

i. 

!• 

i 




P 

PARNUM 


CHAR 

OUT 


REAL 

INTEGER 


ALPHA?16< 

SUMS? 


PBBBBB 

SEQ NO OF THIS PARAGR_^H 


t 

1 

1 . 

-i; ' 


& 

K 

p 


PERIOD 

PREP 


CHAR 

LISTS 


REAL 

DOUBLE 


PUNCT?10< 

RDTBL?42< 


«BBBBB 

50 PREPOSITIONS 


f 

n 


t 

1 


Q 

QUEST 


CHAR 

CHAR 


REAL 

REAL 


ALPHA?17< 

PUNCT?1A< 


QBBBBB 

.QBBBB 


r 

fc" 


R 

PDTBL 


CHAR 

LISTS 


REAL 

REAL 


ALPHA?18< 

ROTBL 


RBBBBB 

10540 WORDS CONATIN WORD TABLES 


1 

^ ■ 


0 

f 

» 


RELPRd 

S 


LISTS 

CHAR 


DOUBLE 

REAL 


RDTBL?142< 

ALPHA?19< 


10 RELATIVE PRONOUNS 

SBBBBB 




h 

0 


SE'NNUM 

SENTYP 


OUT 

OUT 


INTEGER 

INTEGER 


SUMS? 3< 
SUMS?29< 


SEQ NO THIS Sentence 
1 IF DECLAR A 0 IF NOT 


!: 


S 

> 

» 


SENTYP 

SENTYP 


OUT 

OUT 


INTEGER” 

INTEGER 


SUMS?30< 

SUMS?31< 


1 IF DECLAR Bt 0 IF NUT 
1 IF EXCLAM, OIF NOT 




■» 

% 


SENTYP 

SEMIC 


OUT 

CHAR 


INTEGER 

REAL 


SUMS?32< 

PUNCT?12< 


1 IF QUESTION, 0 IF NUl 
«,BBBB 


i 

■ 


» 

ir 

p» 


SLASH 

SPELLX 


CHAR 

LISTS 


REAL 

DOUBLE 


PUNCT? 9< 
R0TBL?6222< 


/BBBBB 

2000 COMMONLY MISPELLEO WORDS 




» 


SSQLET 

STAR 


OUT 

CHAR 


INTEGER 

REAL 


SUMS? 7< 
PUNCT? 2< 


SUM OF SQ OF LETTERS Bf WORD THIS 
♦BBBBB 


j| 

1 


■» 

[^ 


SUBVER 

SUMLET 


OUT 

OUT 


INTEGER 

INTEGER 


SUMS? 5< 
SUMS? 6< 


1 FOR S-V TYPE OPEN 0 FUR NO 
SUM OF THE LETTERS THIS SENTENCE 


-j| 


D 


SUBCON 

SVOPEN 


LISTS 

OUT 


DOUBLE 

INTEGER 


RDTBL?2< 
TOT? 5< 


20 WOS FOR SUB CONJ TEST 
NO OF SENT OPENING S-V 


“1 t 


k0 

r 


SVOPN 

T 


LISTS 

CHAR 


DOUBLE 

REAL 


R0T8L?10242< 150 WORDS FOR b-V UP tN it:>l 

ALPHA?20< TBBBBB 


11 
• ^ 
^ i 


r 


TAPOS 

TCOMMA 


OUT 

OUT 


INTEGER 

INTEGER 


T0T?11< 

T0T?12< 


NO OF APOSTROPHES IHIS e:>5AT 
NO OF COMMAS THIS ESSAY 


if 


\0 

1 

r 


TCOLON 

TCONN 


OUT 

OUT 


INTEGER 

INTEGER 


T0T?17< 

T0T?23< 


NO OF COLONS THIS ESSAY 
NO OF CONNECTIVES THIS ESSAY 


1 


C 

c 

c 

c 


TDA SH 
TDALE 


OUT 

OUT 


INTEGER 

INTEGER 


T0T?16< 

T0T?27< 


NO OF DASHES' THIS ESSAY 
NO OF DALE WORDS THIS ESSAY 




■ TEXCLA 
TENDPT 


OUT 

OUT 


■"INTEGER 

INTEGER 


TQT?20< 

T0T?28< 


NO OF EXCLAMATION PTS THIS ESSAY 
NO OF SENT WITH NO END PNCT THIS 




c 

r 


"TEXT ■ 
TIO 


IN 

OUT 


DOUBLE 

INTEGER 


HLFTXT 
TOT? 1< 


ASSEMBLED WORDS OF THIS SENTENCE 
IDENT NO THIS ESSAY 




C 

c 


TITLE 

TOTLET 


OUT 

OUT 


INTEGER 

INTEGER 


SUMS? 2< 
TOT? 6< 


U IF YES TITLE, 0 IF NO TITLE 
SUM OF LETTERS THIS ESSAY 




c 

c 


TOTWDS 

TPAREN 


OUT 

OUT 


INTEGER 

INTEGER 


TOT? 8< 
T0T?10< 


NO OF WORDS This essay 

NO OF PARENTHESES . 




c 

c 


TPER 

TPERCT 


OUT 

□UOUT 


INTEGER 

INTEGER 


tOT?13< 

T0T?14< 


NO OF PER lOD S THIS c S 5 A Y 
NO OF PERCENT SIGNS THIS ESStY 




c 

c 


TPRE.P 

TQUOTE 


OUT 

OUT 


INTEGER 

INTEGER 


T0T?22< 

T0T?19< 


NO OF PREPOSITIONS THIS ESSAY 
NO OF QUOTES THIS ESSAY 




c 

c 


TQUES 

TRELPR 


OUT 

OUT 


INTEGER 

INTEGER 


f0t?21< 

T0T?25< 


NO Ur yUtiliuni » 

NO OF RELATIVE PRONOUNS THIS ESSA!| 

• K y* r frArli unoK M 


c 

r 


TSQLET 

TSEMIC 


OUT 

OUT 


■ INTEGER 
INTEGER 


■ TOT? 7< 
T0T?18< 


5Un Ur OW A LC • i uf^O CM\#n fiur\iy 

NO OF SEMICOLONS THIS ESSAY 


M 


c 


TSPELL 


OUT 


INTEGER 


. T0T?24< 


NO OF SPELLING ERRORS THIS ESSAY 
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TSCGNJ 


OUT 


INTEGER 


TOT*26< 


NO OF SUBORDINATING CONJ THIS ESS I 


1 


TTITLE 


OUT 


INTEGER 


TOT? 2< 


TITLE this essay 




TTYPE 


OUT 


INTEGER 


T0T?29< 


NO OF DECLARATIVE TYPE A SENTENCE 




TTYPE 


OUT 


INTEGER 


T0T?30< 


NO OF DECLARATIVE B S?NTENCES 




TTYPE 


OUT 


INTEGER 


T0T?31< 


NO OF EXCLAMATORY SENTENCES 




TTYPE 


' OUT 


INTEGER 


“TOT?32< 


NO OF QUESTIONS 


1 


TUNDER 


OUT 


INTEGER 


TOT?15< 


NO OF WORDS ITALICIZED THIS ESSAY 




TWDSQ 


OUT 


INTEGER 


TOT? 9< 


SUM OFTQ OF WORDS IN EACH SENT 




U 


CHAR 


REAL 


ALPHA?21< 


UBB6BB 




V 


CHAR 


REAL 


ALPHA?22< ' 


VBBBBB ' 




W 


CHAR 


REAL 


ALPHA?23< 


WBBBBB 


SrF 


X 


CHAR 


REAL 


ALPHA?24< 


)(BBBBB 




Y 


CHAR 


REAL 


ALPHA?25< 


YBBBBB 


14 


Z 


CHAR 


RETTL 


AlPHA?26< 


ZBBBBB 1 



C COMMON IS AN AREA SHARED BY THIS PROGRAM AND ITS SUBPROGRAMS. 

c ■ "■■ ■■ 

I C£MMq^N/IN/BROKUP_fTE2^T,L^GTH_ PEG I 

t ' 3 R OK U P " I S' AN SO CH A R ACTE R ‘ CAR 0 "I MAGE 

i REAL BR0KUP3;80< PEG I 

g TEXT IS The ASSEMBLED WORDS OF THE SENTENCE. 

I DOUBLE PRECISION TEXTflOO< 

C ■ ■ COMMON shares tHC'SE L I S^^ WITH OtHER PROGRAM'S HAVING '' 

;C COMMON/LISTS IN THEIR TEXT. 

^ " COMMON/LISTS/SUBCdN,PREP,RELPRO,CONNECf'DALEf SPELLX,DECLAB,SW*N PEG I 

iC TYPING ALL WORDS IN THESE ARRAYS AS DOUBLE PRECISIONt I.E. THEY 

f, CAN HAVE FROM 1-12 ‘ CHARACTERS EACH.' 

I DOUBLE PRECISION SUBCON?20<,PREP*5O<,RELPRO?lO<,CDNNECS30<,DALE PEG I 

! l?3d00<,SPELLX*2d00<,dECLABii0'<,SWP “PEG I 

t THE ARRAYS PUNCT AND ALPHA ARE SHARED WITH ALL PROGRAMS 

te HAVING THE" “COMMON /C H AR / " STATEMENtV 

I COMMON/CHAR/ PUNCT » ALPHA PEG I 

iC “ TYPING THE ARRAYS PUNCT AND ALPHA AND THEIR CONTENTS AS REAL. 

I RE AI^^UNCT?20<f BLANK, STAR »DECPTfCOMMA^YPHEN,APOST^,0_PA^ENfCP_ARENfj;EG I 

I ISLASH, PERIOD, COLON, SEMIC, EXCLAM, quest, DASH riTALIC,ALPHAX26<, A, 3, C,PE^ I 

I 2D,E,F,G,H,I , J,K,L,M,N,0,P,Q,R,S,T,U, V,W,X,Y,Z PEG I 

iC tHE EQUiVALENCE StATEMENT IS USED SO THAt THE' CUMULATIVE ASPEC TS“ 

to OF THE ARRAYS PUNCT AND ALPHA CAN BE INCORPORATED. 

I EQUIVALENCE *PUNCT1?1< , BLANK<, *PUNCT?2<, STAR<, *PUNCT?3<,DECPT< , *PUNPEG I 

I 1CT?4<,C0MMA<,?PUNCT*5<,HYPHEN<,^PUNCT*6<,AP0STR<,«PUNCT*7<,0PAREN<PEG I 

I 2,*PUNCT?8<,CPAREN<,*PUNCT*9<,SLASH<,*PUNCT%10<,PERI00<,*PUNCTI1K,PEG I 

I 3C0L0N<,«PUNCT?12<,SEMIC<,*PUNCT%13<,EXCLAM<,*PUNCT*14<,QUEST<,*PUNPEG I 

I ' '4CT*i5<,0ASH<,*PUNCT?16<, ITALIC<,f ALPHAIK, A<,1fALPHA“»2<,B<,*ALPHAi;3PEG I 

I 5<,C<,^ALPHAig4<,Q<,^ALPHA%5< , E<,fALP HA<6 <,F<,aiALPHAt7<,G < , tALPH A%8<PEG 1 

r 6,H<,XALPHA*9<,I<,?ALPHAllO<,J<,«ALPHA*il<,K<,*ALPHA*12<,L<,«ALPHAXPEG I 

I 713<,M<,*ALPHA*14<,N<,*ALPHA«15<,0<,*ALPHA1516<,P<,«ALPHA*17<,Q<,tALPEG I 

I ' 8PHAnR,R<,«ALPHA*i9<,S<,tALPHA%20<,t<,«ALPHA*21<,U<?*ALPHA*22<,V<PEG I 

I 9,*ALPHA?23<,W<,*ALPHA*24<,X<,%ALPHA*25<,Y<,*ALPHA*26<,Z< PEG I 

to tHE “ARRAYS SUMS AND TOT ARE SHARED WITH ALL PROGRAMS HAVING 

to the common/out/ statement. 

I COMMON/OUT/SUMS,TOT P^TTT 

to TYPIN^JHE ARRAYS SUMS AND TOT AND THEIR CONTENTS AS INTEGER. 

^ INTEGER "SUM$tlOO<,ID,TITLE,SENNUM,PARNUM,SUBVER,SUMLET,SSQLET,NUMWPEG I 

1 OS, NWDSQ,NPA REN, NAP0S,NC0MMA,NPER,NPERCT,NUN0ER, NOASH, NCOLO PEG I 

I- 2N,NSEMiC,NQU0tE,NEXCLA,NQUES,NPREP,NC0NN,NSPELLrNRELPR,NSC0NJ,N0ALPEG I 

I 3E,ENDPCT,SENTYP«4<,TOT*lO0<,TIO,TTITLE,NUMSEN,NUMPAR,SVOPEN,TOTLETPEG I 

i: 4,TSQLET,TOTWOS,TWDSQ,TOTFND,TPAREN,TAPOS,TCOMM'A,TPER,TPERCT,TUNDERPEG I 
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PC 

1C 



5.TDASH,TCOLON.T5EHIC.TQUOTE,TEXCLA,TgU E S,TPBEP.TCONN,TSPELL,TRELPRPEG^ 

'’tHe'^eJuIvALE^^STAT^^^ is used so that the CUHULATiyE ASPECTS 

l>-ARNUM<r*SUHS*5<,SUBVER<.*SUHS*6<,SUMLE.<.*SUMS*7<.SS0LET^^ ^ 



3U.MS»I<*^NAP0S^^» TSUMS^12 r » NCOHHA<,*SUiMS%13<»NPER <, *SUMS*lA<,NPtRPtb 1 

«<;11M<^3;15< nUNDER<* %SUMS^16<f NDASH<t ^SUMS?17<f NCOLON<f %SUMS^18 PEG I _ 

s<WS-i-U7isuMS^^^ 

NPREP<.*SUMS*23<,NCONN<,*SUMS*2<.<.NSPELL<,*SyMS«5<^W^ 

°THE^EQU1VALENCE SIAIhHhNT IS USED SU THAT THE CUMULATIVE AbKCUia 
OF THE ARRAY TOTS CAN BE I NCORPORAT^_i^ 



i 



ew,raliWFxT6WiT;fr5-^YmTr2<rmTi^ \ 

lij°pAO^,^TnTtS<.SVOPEN<.aSTOT^6<,TOTLET<,jTOm<^TSQIj^T< ,*TOTa;a<,Tq _^ 

4ER<: ^T0TS16<:i DASH<. 0 1 UK. 



lNSIfffOTX 24 <,fsMLL< 7 *TOTt 25 <,TRElPR<.*TOT* 26 <,TSCONJ<,*TOT* 27 <,TDPEG ^ 






c 

c 



7ALE<f*T0T %2 8< i T E N DP T< .^TOT ^29<f TTYPE< 

■"these' are ■ AO d'e^d"' equivalence statements 

BEEN PLAC^ IN. 



k C 



'Otherwise they could have 

... PRECEDING EQUIVALENCE STATEMENTS* 

EQUIVALE'NCE isUMS«33<tNHYPH<,*T0T%33<fTHYPH< 

E Q U I V A L E NC E %Sy ^ 

integer' NSL“ASH7t"SLA'SH 

- SI'nuWeI^S of i tFl rc I zro-¥o-RDS-THls^lN^^^ 

EQUIVALENT TO EACH OTH.,i<. 



EQUIVALENCE %NUNDERi NITAL< 

TYPING A_ND_ DIMENSIONING WOR D LENGTH S. 

TnTEGER" 'length ' il06< ■■ ■ , .rcrc 

TYPING AND D IMENS I£NI NG .WORDS I N W ORD _L_L^jl 



RE'AL' RDT0L«1O5AO< 

THE EQUIV ALENCE STATEMENT 
%RDTBL< CORRESPONDS 



IS 

TO 



USED HERE TO SPECIFY WHAT PART QE THE_ 



PEG I 



PARTICULAR WORD LISTS 



f^cri ARF . 9E«nTRl »10241<# SVOPN< 



STORAGE OF THE CU RRENT SENTENCE 
REAL HLFTXT?200< 

E Q UI V AL E NCJ _„?HkF;lXT * 11^1^ --- 

VARIABLES SHARED BY OTHER PROGRAMS 



HAVING tHE CdMMON/LOG/ 




C 

C 



LOGICAL TESTS 
LOGICAL SENTNOj^SEND 

■ AN 



array EQOiVALENT TO CERtAIN PUNCTUATION 



making ELEMENTS OF 

i5UrVAllNCETPUNCTrr7< • WOTE<.iPUNa * J8< . P|RCT< 
TYPING QUOTES AND PER CENTS AS REAL VARIABLE^ 
REAL QUOTEfPERCT 
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AN 



ARRAY^RF.LN< AND A VARIABLE SHARE D BY O TH FR PRO GRA MS HAVING THE 



n 



I 



c 

c 

c 



c 

c 



c 

c 

c 

c 



z 

c' 

c 

c 

c 

c 






COMMON/PSUM/ STATEMENT. 

COMMON/PSUM/ RELNfNEXT 

TYPING THE ARRAY RELN AND THE VARIABLE NEXT AS INTEGERS. 
INTEGER RELN?20<,NEXT 

PELN %l< CONTAINS THESUMS SUBSCRIPT CORRESPONDING TO PUNCTSKK 
AN ARRAY OF 10 WORDS SHARED BY OTHER PROGRAM S H AVI NG TH E 



COMMON/L I Sff'/ "STATEMENT. 

CGMM0N/LIST2/ SWORD 

TYPING THE ARRAY SWORD AS DOUBLE PRECISION* I.E. EACH WORD CAN 
HAVE FROM 1-12 CHARACTERS. 

DOUBLE PRECISION SW0RD%10< 

TYPING THE ARRAY SWRD AS REAL. 

RFAL ?WRD*20< 

EQUIVALENCE *SWORO,SWRO< 

A variable SHARED BY ALL PROGRAMS HAVING THE COMMON/COUNT/ 
STATEMENT. 

COMMON/COUNT/ ICTR ‘ 

TYPING THE VARIABLE ICTR AS AN INTEGER. 



"INTEGER ICTR 

A LOGICAL VARIABLE SET TRUE OR FALSE 
ANALYZED IN THE SENTENCE. 

LOGICAL XX 

A LOGICAL VARIABLE SET TRUE OR FALSE 
THE START OF A NEW SENTENCE OR NOT. 



DEPENDING UPON WORD BEING 



DEPENDING UPON WHETHER IT IS 



LOGICAL START 

INITIALIZING THE IMAGE COUNTER. . 

I C TP # 1 

SEE MEMO FOR EXPLANATION^UNIVERSITY COMPUTER SYSTEM 7040-3< 

CALL FPTRAP 

READ IN CARDS CONSISTING OF PUNCTUATION MARKS AND LETTERS OFJTH^ 



"ALPHABET ACCORDING TO FORMAT STATEMENT 899 WHICH SPECIFIES 

ARRANGEMENT. 

READ^5*899< PUNCT, ALPHA 

READ IN CARDS CONTAINING WORDS IN WORD TABLES. THE SPLIT IN THE 
array" ROTBL' IS A RESULT OF HAVING LESS THAN 2000 MISSPELLED 
WORDS IN THE LIST. 



C 

C 

_90_8_ 

c' 

C 



I - 



t 



TeKd%5 , 9yd<“rRDTBL* 1 1 <7fT # 1 » 7650<V" *RDTBL«I K* 1 1#10221 , 10540< 
READ IN CARDS CONSISTING OF 10 WORDS HAVING S ENDINGS. 
REA0«5*90S< SWRD 

2A6 SPECIFIES A DOUBLE PRECISION WORD CONSISTING OF 1-12 
CHARACTERS. 

F0RMATaf2A.6< 

READ IN CARDS CONTAINING THE SUMS SUBSCRIPTS CORRESPONDING TO THE 

PUNCTUATIONS. 

READ*5*905< RELN 

MEANS STORE THE LOGICAL CONSTANT TRUE IN START. 

START#. TRUE. 

UNCONDITIONAL GO TO STATEMENT WHICH INTERRUP TS S EQU ENTIAL 



EXECUTION 
GO TO 15 



AND DIRECTS FLOW TO STATEMENT 15. 



C 

C 

C 

C 



EAO THE FIRST CARO OF THE NEXT ESSAY. IT CONTAINS THE ID NUMBER A^40 
INDICATION OF WHETHER OR NOT THERE IS A TITLE TO THIS ESSAY 



READ IN FIRST CARD OF ESSAY CONTAINING IDENTIFICATION NUMBER AND 



o 

FRIC 
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TITLEIIF ANY.< 



aIsign’logical°constant false to esseno?essay end.< 
f ?eT;o^oet^mine whether all the essays are finished. 

CARD IS PRESENT THEN ALL THE ESSAYS HAVE BEEN PROCESSED. 
IF ?;IDENT.EQ.99999< GO TO 200 



IF A 999 



SAYS THERE IS A TITLE BY ASSIGNING THE NUMBER 



THIS STATEMENT 
ONE TO TITLE. ... 

THIS LOGICAL TEST THEN CHECJ^ JO SEE IF A 
IT ASSIGNS THE PREOEtERMINED VALUE ZERO. 
THE VALUE ASSIGNED IN THE PR ECEDING STEP. 



TITLE EXISTS. IF TRUE, 
IF FALSE, IT RETAINS 



C 



I F .EQ. 0LANK< TITLE^O 

REPLACE ID WITH JHE lOENTIFICAJION NUMBER. 

AS S I GN^L OG I C AL C ONSTA NT F AL S E T0_ S ENTNO^gSENT ENCE 
SENTND#. FALSE. 

INITIALIZE THE 



END< 



SEQUENCE NUMBER OF THE SENTENCE. 



ONE COU^_P^ _CJIAR^ 



SENNUM#0 

INITIALIZE THE PARAGRAPH NUMBER._ 

PARNUM Ml' 

READ THE 30 CHARACTER CARD IMAGEj 

THIS^ASSIGNED^G^^ TRANSFE RS CONTROL TO A CALL ST AT ^^EJ4l_T0 

"Fetermine 
GO TO 100 



SENTENCE ORGANIZATION. 



OF this 



ESSAY 




C 

c 

c 

c 

1 

z 

c 

c 



read I^N next 80 CHARACTER CARD IMAGE_. 
REAb^5,9b2< BROKUP 

A OR B IS TRUE OR IF BOTH A AND B ARB TRUE. 



NOT EQUAL TO A STAR OR 



BLANK 
IF 



EITHER 



GO 

TO 



IF?BRdkUP*i<.NE.STAR.0R.BRdKUP*2<.NE.BLANk< 

IF ABOVE TEST FALSE THEN SET ESSAY END EQUAL 
IF^ABOVE TEST FALSE SET SENTENCE ENPEQUAL TO TRUE. 



TO 101 
TRUE. 



FALSE . 



IF^NEXt/iS^EQ^^^^ to ZERO SET SENTNECE END EQUAL 
IF*NEXt’iS*EQUAL^TO^ZERO go TO 15 AND INITIALIZE SUMS. 



OF SENTENCE ORGANIZATION. 



101 

C 

C 

102 
C 



GO 

00 

IF 



TO 100 
102 ICTMl,4 
tHE 



ir ANY OF THE FIRST FOUR IMAGES NOT BLANK GO 
FOR DETJJL^.NATiaN.pF SENTENCE ORGANIZATION. 
IFa5BR0kUP*ICT<.NE. BLANK< GO TO 100 
INCREMENT PARAGRAPH NUMBER BY ONE. 
PARNUMMPARNUMSl 
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C 

C 

C 

C 

C 




SET SENTENCE END EQUAL TO TRUE. 

SENTND#.TRUE. 

IF %NEXT.EQ.0< SENTND#. FALSE. ....w-rr-r 

THIS CALL STATEMENT CALLS THE SUBROUTINE SNTORG WHICH ANALYZES 
EACH CHARACTER OF THE LINE OF THE ESSAY THAT WAS JUST READ IN. IT 
FINDS WORDS AND PUNCTUATION MARKS AND ASSEMBLES THE CHARACTERS 

INTO SENTENCE COMPONENTS. IT ALSO MA INTAINS A COU NT ON 

SIGNIFICANT SENTENCE ELEME’NTS. 

CALL SNTORG 

IF TRUE GO TO ONE AND READ IN NEXT CARO» IF FALSE CONTINUE WITH 
NEXT_STATEMENT. 
rF%.NOTrSENTNO<' GO TO 1 

MM REPRESENTS THE NUMBER OF WORDS IN THE SENTE NCE. 

’MM#2^NEXf 

WRITE THE SENTENCE ACCORDING TO THE FOLLOWING FORMAT. 

WRITE«6»906<" OTLFTXT^T K, IT#lt MM<’ 

SPECIFIES TEN 12 CHARACTER WORDS PER LINE. 

F0PMAT?1H-20A6/«1H ■20A6<< 

INCREMENT THE NUMBER OF S ENTENCES BY ONE. 

SENNUM#SENNUM^T " " 

THIS SUBROUTINE TYPES EACH SENTENCE ACCORDING TO END PUNCTUATION 
AND FIRST'WORD TYPE. 

CALL TYPSEN . 

THIS SUBROUTINE DETERMINES TYPE OF SENTENCE OPENING. 

CALL SEEO PN 

CHECK EACH WORD AGAINST THE LISTS OF SPECIAL WORDS 
AND THE LIST OF COMMONLY MISSPELLED WORDS. ^ 

ICT IS" INDEXED ACCORDING TO THE NUMBER OF “WORDS IN THE SENTENCE. 

DO 10 ICT#1,NEXT — 

TESTING EACH WORD OF THE SENTTI^E. IF LENGTH*ICT< EQUAL TO 99 

IT IS THE END OF THE SENTENCE. ..... 

IF"^*LENGfH*ICt<.EQ.99< GO TO 10 



C 

C 

C 

C 

c 

C 

C 



c 

c 

10 

c 

c 

c 

c 

c 

c 



c 



SET XX TRUE FOR FIRST OR SECOND WORD OF THE SENTENCE. 

THIS WILL PREVENT IT FR OM BEING C HECKED AGAINST THE L IST OF 

RELATIVE PRONOUN^' 



REPLACING XX “ by ONE OR TWO FOR TEST IN SECOND STATEMENT BELOW. 
XX#ICT.EQ.1.0R.ICT.EQ.2 

REFERS TO A SUBROUTINE WHICH DETERMINES THE TYPE OF WORD. 



CALL CHKLSTITEX r?I CT<^)(X< 

SEE COMMENf ABOVE WHICH STARTS WITH - SET XX TRUE FOR FIRST, 

IF«XX< GO TO 10 . . 

REFERS YO A SUBROUtlNE WHICH CHECKS FOR SPELLING. 

CALL SPELXX%TEXT«ICT« 

CONTINUE 



ETC . 



PINT THE RESUTLS OF THE ANALYSIS OF THIS Lo 

THE LAST SENTENCE OF THE ESSAYt ALSO PRINT THE TOTAL RESULTS FOR 

THE WHOLE ESSAY 



WRITE SUMS FOR LINE ON TAPE 0. 
WRITE%0< tSUMS%IKf II#1»34< 
PRINT SUMS FOR LINE. 



- 206 - 



=er|c 



L 



APPENDIX A (Continued) 



WRITERS f903< tSUMS%l 



C 

11 

12 

C 



ACCUMULATE TOTALS 
DO ll ICT#*1,A 
TnT5«ICt<#SUMS*ICt< 

DO 12 ICT#5,34 

T6TSKICT<#TOt«iCT<CSUMS?ICT< 

REPLACE SENTENCE END BY EAL$E 



TO WORK WITH NEXT SENTENCE. 



C 

C 

15 

13 

C 



SENTNO#. FALSE. 

INITIALIZING THE CUMULATIVE ASPECTS 
NOT RE PRE SE NT t UMU L A 11 VE A S P ECt S 
DO 13 ICT#5,100 _ 



OF 



OF THE PROGRAM. 
"the PROGRAM, 



SUMS 1-5 DO 



SUMS*ICT<#0 

INITIALIZING 



NEXT FOR NEXT SENTENCE. 



C 

C 



NEXT#0 

TEST TO DETERMINE IF BEGINNING OF A NO THER LINE. 

iRSTART<“ GO "to 16 

IF TRUE GO TO SNTORG. 



IF 



IF % . NOT . ESSE .AND . I C TR. GT . 80< 
IF^.NOT. ESSEND< GO TO 100 



FALSE, THEN WRITE 
GO to' 1' " 



TOTALS FOR ESSAY. 



C 

C 

C 



WRITING ACCUMULATED TOTALS FOR AN ESSAY ON TAPE UNIT 0. 

WRITE*0< %'.0T*IK,II#1,3A< 

WPTfrNG'A"CCUMU^ for AN ESSAY ON TAPE UNIT 

WRITE«A< ^T0T35II<,II#1,34< 

PRINTED output" FOR AC^CUMULATED TOTALS FOR AN ESSAY. 
WRITE*6,903< %TOT^ I I< , I I # 1 ,34< 



C 

C 



TYPING THE 
OUTPJJT._ 

REAL Xll%3i< 
IDENTIFI^ATjO^N 

Tf#r 

I2#2 



VARIABLE Xll?31 VARIABLES< FOR PUNCHED AND PRINTED 



FOR PRINTED j^D ^ H ED OUTPUT FOR ESSAY. 



C 

"C 

"C 



T0T*3< IS NUMBER 
SEN#T0T%3< 
to'T?8< I S “NUMBER 
WDS#TOT3;8< 
fdt«2< REPRESENtS 
X11^K#T0T%2< 



OF SENTENCES THIS ESSAY. 



OF WORDS THIS ESSAY. 
Ti t L E f H I S“‘ES S A V'."" ■ 



C 

C 

C‘ 

C 



A RATIO SCALE FOR ONE OF THE VARIABLES. 

X11*2<#!«W0S/SEN<^10. _ . 

Tdfi4<"lS NUMBER Of= PARAGRAPHS THIS ESSAY. 

X11^3<#T0T%4< „ - 

FLOAT foT%5< MEA'NS' make"' the "INTEGER 'TOfiKS^ 
THE NUMBER OF S-V SENTENCE OPENINGS. 



REAL ■and " i f R EPRE SE NT S 



C 

C_ 

511 



C 

C 

C 




XI l*4<#fFL0A T35TOT?5«/SEN<^'lOO. 

Xll%5</(^W0S .. _ . 

”tHE DO 511 LOOP ESTABLISHES THE NUMBER 
jhe ESSAY _ACCORDING TO A PREDETERMINED 
"bd'slT ii#6f i4 

XI l«I K#*FLO ATtTOTXI I£4«/WDS<*10M^ 

THE DO 512 LOOP ESTABLISHES THE NUMBER OF 
QUESTION MARKS,, and ..NUMBER OF EXCLAMATION 
ACCORDING TO ”A PREOEtERMINED SCiiALE. 

DO 512 II#15,17 

“)ai«Tr<#*FLdAt?t0TXII&4<</SEN<*l000. 

THE DO 513 LOOP ESTABLISHES THE NUMBER OF 
CONNECTIVES, SPELLING ERRORS, 



OF PUNCTUA7 
SCALE. 



iON MARKS 



QUOTES, NUMBER OF 
POINTS IN THE ESSAY 
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CONJUNCTIONS f AND PALP WORDS FOR THE ESSAY# 



OF SENTENCES WITH NO 
SENTENCESt AND DECLARATIVE 



C 

C’ 

C 



c' 

c 

c 

c 

c 



DO 513 II#13,23 

X1135IK#*FLQAT*TOT*I IC4<</WDS<*100. 

THE DO 514 LOOP ESTABLISHES THE NUMBER 
ENDINGPUNCTUATIONt DECLARATIVE TYPE A 
TYPE B SENTENCE'S IN THIS ESSAY. 

DO 514 II#24»26 

Xll*IK#«FLOAT*TOT«I l£4«/StN<*lud. 

SCALE FOR SUM OF HYPHENS IN ESSAY. 

33«/WDS<’^lbOO. 

SCALE FOR SUM OF SLASHES IN ESSAY. . . 

^Xi“r?2ra?FLOAT*TOf«^^ 

SCALE FOR SUM OF LETTERS THIS ESSAY. 

X11?29<#*FLOAT*TOT%6<</WDS<*100. 

*SQRT OF SUM OF SQUARED LETTERS *TOT5f7« IN ESSAY DIVIDED BY THE 
■ ■“ NUM'BER"dF”WbRbS~ SUM OF LETtERS biVIDED BY 100 

SQUARED . THIS QUANTITY IS THEN_ MULT^LIEO BY A 100. 

m%Td<¥SQRt*FL0ATWf?7<</WDS-?^ 

TEN TIMES THE SQUARE ROOT OF THE QUANTITY SUM OF SQUARE OF WORDS 

IN EACH SENTENCE DIVIDED BY THE NUMBER OF SENTENCES MINUS THE 

NUMBER_pF WORDS Oj^.yjDEP.. .BY JHE^^^^^^ SENTENC ES TIMES 10 

SQUARE"^ 

XU«31<#SQRT^FLQAT*T0T*9<</SEN-%X11%2</J_p.^^ - -- 

PUNCH bUTPU^^^^ 

PRINTED OUTPUT FOR AN ESSAY. 



WRITE*6f965< IDt 1 1 1 *X11 *I I< 1 1 1 #1 1 9< t %X1 1*1 1<» 1 1 #11 » 16< t IDt 1 2 1 

l*Xll*II<tII#lTt31< 

PUNCHED CARD OUTPUT FOR AN ESSAY. 

WRJTE*7,904< id, I 1,*X11*I I<, H #1 ,9< 

i*x ii * i' i < , I r# 17 , 3 i < ' « . ^ „ c T 

REPLACE START BY FALSE SO THAT THE PROGRAM WILL NOT INTERPRE_T_ 



C 

16 

C 

C 

14 



START AS THE BEGINNING OF ANOTHER LINE. 

INITI AU I F ICAT ION NUMBER , t rtLEr'SEQUENCE NUMBER 

SENTENC£,....ANp...S NUMBER, OF PARAGRAPH. 

DO 14 ICt#i,4 
SUMS*ICT<#0 



OF 



C 

C . 

17 

C 



THE DO^rr LOOP INITIALIZES THE CUMULATIVE ASPECTS OF THE 
PROGRAM FOR EACH ESSAY. 

DO if iCT#5,34“" 

T0T*ICT<#0 

BEGIN Am"YZING THE ESSAY BY GOING TO 2. 



GO TO 2 

C SET identification NUMB-ER EQUAL TO TOT*K. 

200 T0T*1<#IDENT 

C WRITE tHE TOTALS FOR ESSAY ON TAPE 0. 

WRITE%0<_*T0T*IK,II#1,34< _ 

C WRITE THE totALS FOR tHE ESSAY ON TAPE 4. 

WRITE*4< *T0T*IK,II#1,34^^ 

CT' LAST ENTRY MADE ON TAPE 0. 

END f n-i_o 

' C " LAST ENTRY MADE ON tAPE 4. 

END_FIL_E„4_ 

' C REWIND TAPE 0 *END OF JOB< 

REWIND 0 

C REWIND TAPE 4 tEND OF JOB< 



1 
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REWIND 4 



C 

899 
C 

900 
C 



RETURN 

SPEC I£1ES 6 CH ARACTER WORDS 
F0RMAT%A6< ■■ 

SPECIFIES SIX 12 j:haracter ^ 



FOR PUNCT AND _ ALPHA. 



PER LINE f.pJ^ilORD_„LIST5^^ 



tKTP^ T^SPACESf IDENTIFICATION NUMBERt SKIP 2 SPACES» 



TITLE 



901 
C 

902 
C_ 

903 
C 



F0RMATtI5#9X fAl< 

5P. F CIFIES 80 SUCCESS IV E FIELDS 
FORNAT»80A1<' 

SPECIFIES PRINT OUT FOR SUNS AND TOTS. 



OF ONE CHARACTER EA CH. 



SP ECIPlt^ KKUMi uui 

FORMAT*lHOI5,I2t3I3, 215, 215, 1713, I4.5I2f2I3< 

SPECIFIES PUNCHED CARD OUTPUT FDR AN ESSAY%TOTALS^ 



904 
C_ 
'965 

C_ _ 

905 



:>r — — - — — — 

FORMATXIStll 

SPECIFIES PRINT OUT FOR -AN ESSAYXTOT ALS< 



F"ORMAT3^1Xf5",'ll »15F5.0< 
SPECIFIES 20 IN TEGERS tl 



OR 2 DIGITS< FOR RELN. 



F0RMAT?~2di2"< 



Ero 
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C 

C 

c 

c 



c 

c 

c 



c 

c 

c 



1 



nJ 



7PP<«fSUMS %26<»N$C0NJ <t^SUMS^^<jLNDALE^fJSUMS^.2^_»_EN[DPC_T<^^^ 

8<,SENTYP< . 



Ktb 1 

*EQOIVALHNCE*TOT*l<,TIO<,*TOT*2<tTTITLE<,*TOT?3<,NUHSEN<t»TOrt4<,N PEG I 

lUMP4R<t*Tbt*5<tSVOPEN<t*TOTS&<tTCTLET<,*TOT*7<tTSQLET<t*TOT*3<tTOTPEG 
2WDS<,*TOT*9<,TWDSQ<,*TOT*10<,TPAREN<,*TOT*1K,T4P 



1 urir*Aiv>.f >0 • « ■ J Vurci^Nf -b n-»i w.- . 'I* DPfi I 

?WD<^<t*TUT«9< tTV<DSQ<f *T0T«10<,TPAREN<f %TOT^:il<»T^P J 

3QS<,«t0T%i2<fTC0MMA<,XT0T%13<fTPER<f %T0T«14<,TPERCT<,*T0T%15<f TUNOPEo 

Lr <. «TOT *1 6< ,TDASH <, %TOT *1 7< , Tj;^gN< • ] ... 

50TE<,*TGT%20<,TEXCLA<,*T0T%2i<fTQUES<f%T0T%22<,TPREP<,<T0T%23<,TC0PEG 
6NN<f %T0Tt24<fTSPELL<f*T0T«25<fTRELPR<f *TOT*26<fTSCONJ<f*TOU27<,TDPEG 

TCKinOT^ - <rTnT^9Q^"^TTYPP^ r 1 1 



6NN<,%T0Tt24<,TSPELL<,*T0T«25<,TRELPR<f*T0T*2( 
7ALE<,XT0T ?2 8< ,TENDPT< , %TQT?29<f TTYPE< 
EQUIVALENCE %SU.MS%33<f NHYPH<t *T0T?33<f THYPH< 
EQUIVALENCE %SUMS«34<,NSLASH<f SgtQT*34<»TSLASH< 

INTEGER NSLASHf TSLASH 

INTEGER NHYPHf THYPH" 



IN I tbtK NnT rnf i ni rn 
EQUIVALENCE XNUNDERt NITAL< 

INtEGE“R LENGTH ?r00< j 

fluWAL'ENCE %RDTBL*i<f SUBC0N<t%R'DTBL?4l<tPREP<f tR'DTBL*l41<f RELPR0<PEG I 
,^gpnTR..?^L:C0NNEC<.^RDTBL?22 I<.DALE<,%RDTBL%622I<,S ^^^^^^ 
~ 2^t022l<V ' ^ CTO~^r^l^^T^1024l<>SVOPT^ ‘ 

REAL HLFTXT?200< 

EQUIVALENCE ?HLFTXT » TEXT< 

COMMON/LOG/SENTNDf ESSEND 

logical SENTNDfESSEND 

EQUIVALENCE* PUNCT*17< ,QUOTE<f *P UNCT*18<tPERC T< 

real QUOTE, PERCt 

COMMON/PSUM/ RELNfNEXT 



1 



COMMON/PiUM/ KtLfMfHiCAi 

RELN^|l< CONTAINS THE SUMS SUBSCRIPT CORRESPON^DING TO PUf^|CT*I< 
CO MMON/L i S T2/ S VIOR b 

DOUBLE PRECIS IO N SW0RD*10< 



REAL SWRD*20< 

EQUIVALENCE *SWOROfSWRO< 



SNTORG ANALYSES EACH CHARACTER. OF i^^E LINE OF THE THAT 

WAS jUSt'REAO IN. IT FINOS WORDS AND PUNCTUATION AND 
ASSEMBLES THE CHARA CTERS INTO SENT ENCE CQMPONEjlTS 



ASStMuLtJ^ Inc mu i crv * ^ ^ ci’^cucmtc 

THIS PART AL^ MAINTAINS COUNTS ON SIGNIFICANT SENTENCE ELEMENTS 



*3 






I 



* A 



AN AREA SHARED BY THIS SUBROUTINE AND THE MAIN PROGRAM. 

COMMON/COUNT/ ICTR . . 

TYPING IMAGE COUNTER AS AN INTEGER. 

INTEGER ICTR 






typing THE VARIABLES ONEt TWO, AND THREE AS LOGICAL. 

LOGICAL _.O..NE TWO, THREE 

TYPING Cf AND IRELN AS INTEGERS. 

INTEGER CT, IRELN . . .. 

TYPING YeMPA as A DOUBLE PRECISION WORD. 
nntlRlF PRECISION TEMPA 



C 

C 



C 

c 



TYPING THE ARRAY TEMPB AS REAL. 

REAL TEMPB*2< v 

MAKING fEMPA AND TEMPB EQUIVALENT. 

an^area'sh^^ subroutine typsenuype of 

SENTENCEC 



COMMON/ENDS/ ALSPER,ALSEXC, ALSQUS 



er|c 
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n 

o 



c 

c 



c 

c 

c 

c 



c 

c 



c 

c 

1 



c 

c 



c 

c 

c 

c 



c 

c 

3 



TYPING THE ARRAYS AS REAL* 



«EAL ALSPE“Rl2<f ALSEXC%2<,ALSQUS«2< 

THIS CALL STATEMENT RESULTS IN SETTING AL5PER EQUAL TO A 
punctuation MARK a5PERI00< WITHIN QUOTES. 

CALL DATA %ALSPERf6H.* < 

THIS CALL STATEMENT RESULTS IN SETTING ALSEXC EQUAL TO A 

PUNCTUATION MARX^EXCLAMAT ION POIjn< 

CALL DATA ^TALSEXC ,6H.X>^ < 

THIS CALL STATEMENT RESULTS IN S ETTIN G ALSQUS EQUAL TO 
PUNrr'UAtlON MARK^QUESTION MARK< WITHIN QUOTES. 

is TRUE. IF so, RETUP'i tO 

THE MAIN PROGRAM. IF NOT t CONTINUE. 

INITI ALI ZE LETTER COUNTER. 

I r TR #0 • ■ 

A logical test to DETERM TNE“Tf' IMAGE COUNTER GREATER 

TO 81. IF SO, REPLACE ICTR .BV ONE AN^^^^ JI 

I E % I Cf R . G E . 3 1 < I C t R # i 
IF%SENTND< RETURN 



THAN 
NjT , 



OR EQUAL 

continue 



same as PRECEDING STATEMENT EXCEPT RETURN 

IMAGES HAVE ALL BEEN ANALY^D. 

iFllCTR.'GEr81.ANb.LCtR.NE.0< GO TO 200 

I F f I C TR . GE . 8 l< RETURN ' 

SEt LOGICAL CONSfANt TRUE EQUAL TO ONE. 

ONE#. TRUE. 

SET LOGICAL 
TW0#.TRUE_. 

SET 



IF TRUE, I.E. THE CARD 



LOGICAL 

TH_REE#.TR_UJ_. 

THE do’ 2 "loop I'S 
BREAK OUT OF THE 



CONSTANT TRUE EQUAL TO TWO. 
CO'NSWNf'tRUE' EQUArTb"tHR"E^^^ 



EStABLlSHEO TO 
LOOP. IF NOT, 



FfND A PUNCTUATION 
CONT INUE 



MARK. IF SO, 



TO DETERMINE IF IfUGES 2-9 ARE PUNj^yA_nQ-OM5.^. 

POI Nt?3<,C0MMAf4<,HYPHEN*5<,AP0STR0PHE%6<, 



00 2 CT#2,9 
C A L OGICA L TEST 

C OPE N*" PAR ENTH ESI S ?7< t CCOS ED P^^ sisK? SL A^^ — _S0.___ 

C fb 'THREEV , 

9 TPa:RRQKUPl ICTR<.EQ.PUNCT^CT << GO TO 3 

-- 



IS AN IMAGE. 
CONTINUE. 



IF SO, GO TO 



A LOGICAL TEST TO DETERMINE IF THERE 
300 TO WORK WITH THE IMAGE*.._IF_ N^^^ .. 

NOT EQUAL TO 

iE^O. I F SO .'GO to 266 ." IF NOT, CONTINUE. 

IF1SLCTR.NE .Q< GO TO 200 



C 

C 



IF ABOVE THREE TESTS FALSE GO TO AOO. 
mdf THi“>UNCtijAtiON' MARK THEN COMPARE WITH 

PREESTABLISHED W0Rp_PUNCTUAT10NJtARKS. . .... 

TRKUP*T<#BRdKUP*iCtR< 

BRKUP*2<#BROKUP*iCTR6K 

BRKUP*3<#BR0KUP*ICTR62< 

IF%ICTR.GT.79< BRKUP*2<«BLANK _ 

IFfi'cTR'.GT".T8< BRKUP*3<#BLANK 

fF-W^H^sr all-are true and lett-er counter is 

TO 100 FOR SUMMATIONS BECAUSE I T IS THL.m.gf- 



A SENTENCE 
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C 

C 



5 

C 

_C 

C 

c 

4 



C 

100 

C 



c 

c 

c 

c 



c 

c 

101 

c 



c 

c 

c 



IF^?*TEMPB?K. EQ-PERI 0D< . OR . < • E Q_« E R_* EM P BX 1 < • 0 

1UEST«.AND.LCTR.EQ.0< GO TO 100 

SAME AS COMMENT ABOVE. on ^tcudu^i^ po 

IF%%?TEMPB%1 <.EQ.ALSPER<.0R.?TEMPB*1 <.EQ.ALSEXC<.0R.35T6MPB?1<.E0. 

1ALSQUS«. AND.LCTR. EQ.0< GO TO 100 

REPLACE THREE BY FALSE. 

THREE#. FALSE. 



IF LETTER cWNT~^n^t~~E^^ TO ZERO"GO TO 4 FOR FURTHER' CHECKING 
TPH TTR NF 0^ GO TO A 

IF TEMPB%1< equal TO AN ITALIC MARK GO TO 100 FOR FURTHER 
PROCESSING. 

IF^TEMPB^K'.EQ. ITALIC< GO TO 100 

CALL PACK^8RKUP^l<fTE MPA»2< 

REPLACE TWO BY f^LSE. 

tHE DD'^'^^CdOP srtS UP THE APPROPRIATE INDEX FOR STATEMENTS 
FOLLOWING STATEMENT 100 AND ALSO DETERMINES IF TEMPBXK IS EQUAL 
to A"PERI0'D* COLONf’ SEMICOUONf EXCLAMATIONf QUESTlONt OA^H» 

ITALIC f QUOTEf OR A PER CENT. 



DO 5 CT#10,18 
IRELN#RELN?CT< 
IFtTEMPB«K.EQ.PUNCT*CT« 
REPLACE TWO BY TRUE, 
two ir.TRUE. 

REPLACE ONE BY FALSE. 



GO to 100 



AN ASSIGNED GO TO STATEMENT WHICH ELIMINATES FOLLOWING TEST. 

GO TO 100 ' ■ 

IF IMAGE IS EQUAL TO AN APOSTROPHE GO JO _ 

'IF«bTOKUPXICTR<.£Q.APOSTR< GO TO 500 

ASSIGN THE MINIMUM VALUE OF THE TWO ARGUMENTS EQUAL TO NEXT. 

NEXT#MIN0*NEXT£1»100< ^ xdd: ccMTckirp 

REPLACE LENGTH BY 99 WHICH IS THE END OF THE SENTENCE. 

IF*^fW&"tl^^ TO 101 AND BEGIN SUMMING PUNCTUAtlON MARKS. 

SUM^THE ^ ^R OPR ^"TUN'CTUAT I ON~~^^'S AS IND I C^Af ED BY THE 
SUBSCRIPT IRELN. 

AN^ASSIGNEO^G^ WHICH ELIMINATES THE FOLLOWING 9 

TEStS. 

-^-"J-ntInd' W" rHW'ARr T^ ^ 

WORDS BY ONE. 

IF^ONE.ANO.* .NOT .THREE« NI TAL#NITALS1 
IF NOT ONE OR THREE TRUE GO TO 107. 

IF%.N0T.*0NE.0R.THREE« go TO 107 

S ET END OF SENTENCE PU NCTUATIO N EQUAL_IO_gN Ej, 

IF^TRUE^INCREASE THE NUMBER OF PERIODS BY ONE. 

IF%tEMPB?r<. EQ. PERI0D< NPER#NPERC1 

IF TRUE INCREASE THE NUMBER OF PERIODS BY ONE. 

IF*tEMPB*K. EQ. ALSPER< NPER#NPERS1 

IF TRUE INCREASE THE N UMBER OF EXCLAMATIONS BY_ONE^ 

if*tempb«i<';£Q".exclam< NEXCLA#NEXCLA&1 



o 
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IF TRUE INC R EASE THE NUMB ER OF EXCLA MATIONS BY ONE 



C 

C 

C 



IF%TEMPB%K. EQ.ALSEXC< NEXCLA#NEXCL ACl 
IF TRUE INCREASE THE NUMBER OF QUESTION MARKS oY ONE. 
IFtT^MPB^K. EQ. QUEST < NQUES#NQUESC1 

IF tIuE INCREASE THE NUMBER OF QUESTION MARKS BY ONE. 
IF%TEMPB%K. EQ.ALSQUS< NQUES#NQUESC1 

ASSIGNED GO TO STATEM ENT. 



C 

C 

107 

109 

C 



IMAGES *2-9< 
BY RELN^CTC. 



C 

102 

C 



C 

C 

C 

C 



GO TO 102 

THE DO 109 LCOP DETERMINES IF ANY OF THE 
PUNCfUATION MARKS. IF SOt REPLACE IRELN 

DD 109 CT#2f9 _ - 

iFSBROKUP%ICtR<.EQ.PUNCtXCt<< IRELN#RELN%CT< 
INCREMENT SUMS BY ONE FDR APPROPRIATE PUNCTUATION 
SUMS%IRELN<#SUMS*IRELN<&1 

IF ONE IS TRUE I NCREMENT 1^0. _ 
IF^ONE< ~ LCTR#LCTRC2 

IF TWO IS TRUE INCREMENT LETTER COUNTER . 

IF?TWO‘< ■■ LCtR#LCT'R£l 

CALL PACKf BRKUP‘^ l<fTEXT^NEXT<f LCTR< 

V^UE " ‘ 



ARE 



MARK. 



REPLACE IMAGE COUNTER BY 

LETTER COUNTER. 

ICTRilCTRGLCfR 

INITIALIZE LETTER COUNTER. _ 

LCTR#0' 

IF NOT THREE TR UE GO TO ONEt 



IN IMAGE COUNTER PLUS" VALUE IN 



OTHERWISE CONTINUE. 



C 

C 

C 



IF*.NOT.THREE< GO TO 1 

decrement image counter BY ONE. 

Tc TRli^I C TR** 1 

SET SENTENCE END EQUAL TO TRUE. _ 

SENTNDi.TRUE. 

REPLACE NUMBER OF 



WORDS SQUARED BY NUMBER OF WORDS TIMES 



C 

C 

C “ 
200 



NUMBER OF WORDS. 

NWDSQ#NUMWOS*NUMWOS 

RETURN to ORIGINAL PROGRAM. 

RETURN . _ . . 

SET NEXT EQUAL TO THE MINIMUM 
NFXT#MINO%NEXTSlflOO< 



VALUE OF THE TWO ARGUMENTS. 



C 

C 

C 

C 



LENGTH OF THE PARTICULAR WORD 

LETTER COUNT ER*LCTR<. 

LENGTHilNEXt<#LCTR 

REPLACE IBACK WITH IMAGE COUNTER 

VALUE.' 

1 8 ACMICTR-LCTR 



IS SET EQUAL TO THE CONSTANT IN 



VALUE MINUS LETTER .COUNTER 



C 

C 



PLUS ONE ^ACCUMULATING 



packing EACH WORD FOR ANALYSIS. 

CALL PACK*bROKUP%IBACK<»TEXT%NEXT<,LCTR< 

REPLACE NUMBER OF WORDS BY NUMBER OF WORDS 
WORDS IN SENTENCE<. 

NUMWDS#NUMWDS£1 ^ SQUARED 

REPLACE SUM OF SQ UARED LETTERS BY WHAT IS IN SUM OF SQUA^D 



C 

C 

C 



"letters PLUS LETTER COUNTER VALUE SQUARED. 
SSQLET#SSQLETCLCTR*LCTR 

RFPLACr'LEtTER COUNTER WITH 0 AND CONTINUE 
LCTRMO 

CONtiNUE WITH NEXT IMAGE. 

GO TO 1 



WITH NEXT WORD. 



SUMMING THE LETTERS IN THE SENTENCE. 
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.300 



1C 



SUMLET#SUMLET€1 

FINDING THE LENGTH OF A WORD. 

LCTR#LCTRC1 

SUMMTNG THE IMAGES ON A CARO TO DETERMINE WHEN ALL IMAGES HAVE 
BEEN PROCESSED. 

ICtR#ICTR'ei 

CONTINUE WITH NEXT I MAGE 



C 

C 

400 

C 



500 
C 



GO TO 1 

SUMMING THE IMAGES ON A CARD TO DETERMINE WHEN ALL IMAGES HAVE 
BEEN PROCESS ED. 

ICTR#ICTRai 

CONTINUE WITH NEXT IMAGE. 

GO TO 1 

INCREASE THE NUMBER OF APOSTROPHES BY ONE. 

NAPOS#NAPOSai 

■ INCREMENT’ LETTER COUNTER BY ONE. 

LC TR#LC TR £1 

’ Tn'CR’EMEN BY one. 

ICTR#ICTRC1 

CONTINUE WITH'PROGRAM. 










Ir 
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4CT<f a?SUMS%15 <f NUNDER< t ^$UMS:^16<f NDASH<» %S UMS^17< »NCOLON< t g 1 

5<fNSEMIC<t%SUMSi;i9<,NQU0TE<f *SUMS«20<,NEXCLA<,*SUMS*2Kf NQUEi <f %SPEG I 
6UM$ %22<fNPRE P<y<$UMS%23<yNCON^N<jLj$UMS!t24<fN$PELL<f %SUM$^25<f NR|L^ PEG I 

7PR<y ?SUMS?26<y NSCONJ< y %SUMS3?27<y NDAL E< y %SUMS?28< y ENDPCK y ?SUMS*29 PEG 
_ . ^ Pcvi I 



5<yNSEMIC<y%SUMSi;i9<yNQU0TE<y*SUMS«20<yNEXCLA<y:«b 
6UMS%22<yNPREP<y.«SUMS*23<yNCONN<L%SUMS*24<yNSPELL 
7PR<y ?SUMS?26<yNSCONJ<y%SUMS3?27<yNDALE<y%SUMS?28< 

8<#SENTYP< _ _ 

“““•■Tv^Ei^T«tM?l<ytlD<y?TbT%2<yTtlTLE<y«t^^^ peg 1 

iB<^.i:TQTa;S <.SVOPE.N<.^TOTit6<,TOTtET<,^TOTii7<,TSO LET<.%TOn8<.TOTPEG 1 
Cy%T0T%9< yTWDSQ<y *T0T%10<yTPAREN<y%T0T%ll<yTAP PEG I 

^ <»TnTqM • T 



1 



EQUI 






INTEGER LENGTH «100<_ 

REAL RDTBL%10540< ^ 

EQUIVALENCE iSRDT8L3Sl<ySUBCON <y %RDTBL^4l<y PREP<y%RDTE 
Ty~lRbTBL^iM<‘yClDfiN^ , %R DT BL *2 2 1< y D A L E< y ?ROT BL ? 62 2 l<i 
2 ?1 0 22 1< y OECL A^B< yJfRDTBLI10 241<y $VOPN< 

■ *"Ffxt* 2 dO< 



f 



:*J.UC^XN.yL/ c yu MU >. LrJlZiJl-- 

REAL HLFfxt*2dO< 
EQUIVALENCE %H LFTXT y TEXT< 
C0MM0N/L0G/SENTNDyES5END 
LOGICAL SENTNOyESSEND 



L 



COMMON/LOG/SENTNDyESSEND 

LOGICAL SENTNOyES SEND 

EQl}lTALENCE*PUNCT*17<yQU0tE<y*PUNCT*18<yPERCT< 

REAL QUOTE yPERCT 

COMMON/PSUM/ RELNyNEXf 



RcAL wUU I c y r CKu I 

“ COMMON/PSUM/ RELNyNEXf 

I NTEGER RELNI20<yNEXT > h..Mr-T<rT^- 

RELN %I< CONTAINS THESUMS SUBSCRIPT CORRESPONDING TO PUNCT%I< 

COMMON/LI ST2/ ^ORO 

“DOUBLE PRECISION SWORD«iO< 






UUUDLi:: 

REAL SWR0%20< _ __ 

EQUIVALENCE' XSWORDy SWRD< 



THIS PART CHECKS A WORD AGAINST A LIST OF COMMONLY MISSPELLED WORDS 









Typing The Vari able word as ooublF precis ion. 

DOUBLE PRECISI ON WOR D 



5f. 



r C 




r 2 



NS“PELL#NSPELLS1 
RE TURN TO M AIN PROGRAM. 
REtURN 

END 



$IBFTC TSEN 



2 

... 
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SUBROUTINE TYPSEN 

COMMON/I N/BROKUPf TEXT f LENGTH 
REAL BR0KUP?80< 



REAL_BR0KUP?80< 

DOUBLE PRECISION TEXT*100< 

COMMON/LI STS /SUaCON, PREP, RELPROfCONNEC, DALE, SPELLX,DECLAB,SVJPN PEG I 

DOUBLE PRECISION SUBCON*2O<,PREP?50<,RELPRO*10<,C0NNEC«3O<,OALE PEG I 

1 «3000< , SPELL X*2000< , DECLAB% 10< ,SV0PN3S150< 

CGMMON/CHAR/PUNCT?ALPHA \ 

RPAl PllNCTJiPOC. BLANK. STAR, DECPT, COMMA, HYPHEN, APOSTR,OPAREN,CPAREN, PEG I 

’, DASH, ITALIC, ALPHA*26<, A, B,C, PEG I 



PEG 

PEG 






3E,ENDPCT,SENTYP«4<,T0T*100< 

4,TSQLET,T0TW0S,TWDSQ,T0TFN0 
57TMSH,tC0L0N,tSEMIC,TQU0tE caui. 

6,TSC0NJ,TDALE,TENDPT,TTYPE%4< 
equivalence ?SUMS«i<,ID<,*SUMS«2< 

1 .PARNHM<.tSUMS»5<.SUBVER<,^SU MS%6<,5UWLt 
20<,NUMWOS<,*SUMS*9<,NWDSQ <, ^ 

3UMS*1K,NAP0S <, XSUMS%12< ,NCOM.MA<, %1UMS^^^^^^ .A*ASM.«S*l^<jNPERPEG 

4Ct<r*SUMS%15<,NUNDER<,%SUMSiS16<,NDASH<,XSUMS3517<,NCOLON<,*SUMSX18 PEG 

NEXCLA<,%SUMS%2K,NQUES <,«SPEG 

r*SUMS^29 PEG 



50TE<,%T0T*20<,TEXCLA<f%T0T*21< 

6NN<,XT0T*24<,TSPELL<,*T0T?25<,TRtLKK^,»iui:i^i 

”7ATE< , *toY %2 8<, TENDPT< , ltOT%29< , TTYPE< 

EQUIVALENCE ISUMS*33< ,NHYPH<, *T0T%33<,THYPH< 
"■“EQUIVALENCE *SUMS“%34<,NSLASH<, %T0T%34< , TSLASH< 
INTEGER NSLASH,T$LASH 



INTEGER NHYPH,THYPH 
EQUIVALENCE %NUNOER,NITAL< 

Integer "length «ib()< 



I 

I 



6 ,H<, »ALKMA:«V^, l^,*Al.rnA«iUS., JS, >*«urnM-fcxxv,rvv, 

713<,M<, %ALPHAaS14<,N<,^ALPHA^15<,0<,^ALPHA^16<,P<,^ALPHA 
8PHAX18<,R<,%ALPHAJ119<,5<,?ALPHAX20<,T<,*ALPHA?21<,U<,%A 

9 , % A L PH A * 2 3 < , W< , X AL PH A *24 < , X_< , * AU^H A* 2^jY< , %ALPHAt 26 < , Z 

"COMMON/ * ^ I 

INTEGER SUMS nOO<. ID , TI TLE,SENNUM, PARNUM, SU BVER , SUMLET, SSQLET, NUMWPEG I 
i d 7,NWDSQ ,NPA REN, NAPOS , NCOMMA, NP'ER,"NrE"RCT ,NUNDER,NDASH, NCOLO PEG I 

2N,NSEMIC,NQU0TE,NEXCLA,NQUES,NPREP,NC0NN,NSPELL,NRELPR,NSCDNJ2jvlDALP^S^ I 
— - ^4<,T0T*100<,TID,TTITLE,NUMSEN,NUMPAR,S\^PEN, TC 



_ PEG I 

*SU MS %2 < , ti tC E <7fSUMS *3 < SE NNUM <V% SUMS%4<P E G I 

.*SUMSX6<,SUMLET<,*SUMS«7<,SSQLET<,1SUMS1PE^ 
“O ^ *^S«10<,NPAREN<,«SPEG I 



6UMS*22<,NPREP<,XSUM5XZ3^,NLUIMN^ 

7PR <,%SUMS1?26<,NSC0NU<,^SUMS%27< 

8^ SENTYP^ Ptb 

EQUIVALENCE*T0T«1<,TI0<,%T0T%2<,TTITLE<,XT0T*3<,NUMSEN<,%T0T*4<,N PEG 

1 UMPARY, «T0T % 5 < ,'S VOPEN<, *TOT*6<, T OTL ET<, %tdT* 7<, TSQLET<, %TOT* 8< , TOTPEG 

2WDS<,%TOT*9<,TWOSQ<,«TOT%10<,TPAREN<,*TOTX11<,TAP T.iMHocr 

3OS<,“35t0T?12<,TCdMMA<,«TOT%13<,TPER<,XTOTX14<,TPERCT<,*TOT*15<,TUNpPEG 

4ER<, >T0T^16< ,TDASH< ormr^iox TCCMtr^. 
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o 



mc^ 



T 



REAL RDTBL%10540^ rcv» i. 

EQUrVALE'Nt'E '*RDT BL*K,SUBCON<, *RDTBLX41<»PREP<,*RbtBLX141<,RELPR0<PEG I 
I ,!8RDTBL%161< ,CONNEC<, ^RDTBL>22I<>DALE<, %ROTBL^6221<, $PELLX ^|R_DTBLPEG I 
2*l0221<,DECLAB<,«ROTBL%1024KfSVOPN< PEG I 



5 









1 
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APPENDIX A (Continued) 



REAL HLFTXT%200< 



EQUIVALENCE ?HLFTXT , TEXT< 
COMMON/LOG/SENTNDfESSEND 
LOGICAL SENTNOfESSEND 



LOGILAL 

E Q U I VAL E NC E % PUNC T % 1 7 < t QU OT E < f_%_P U NC T * 1 8 < t P E RC T < 

'■‘real QUOTE f’PERCt 

COMMON/PSUM/ RELNfNEXT 

INTEGER RELN*20<f NEXT 



INTEGER RELN!tZtKf Nt A I 

contains thesjjhs subscri pt co rresponding TO_PUNCni< 
C0MM0N/LTST2/ SWORD 

_ DO UBLE PRE CI SION $W0 RD<10< 



REAL SWRD*20< 



EQUIVALENCE %$WORD t$V<RD< ^ 

COMMON/ENDS/ ALSPERt ALSEXCt ALSQUS 



REAL _A L S PE R%2 < t_A E XC_*2 < t A L$QUS%2< 
■“ LOG I C AL I NT ABL 



C 

C 

C 



THIS 

AND 



PAP/f TYPES EACH 
FIRST WORD TYPE 



SENTENCE ACCORD! NG'Td 1ST END PUNCTUATION 



C 

C 



REPLACE LAST BY TWO LIMES N EXT MINUS ONE, 
LASf#2*NE'Xf-l 



C 

c“ 



IF SENTENCE ENDS WITH A PE RIOD GO TO 100 TO DETERMIN E TYPE OF 



■ D'e'CL A'R'AT I VE" SENTENCE. ^ 

IF%HLFTXT%LAST<.EQ. PERIOD. OR. HLFTXT^LAST<.EQ.ALSPER< GO TO lOQ 



C 

C 



IF SENTENCE ENDS WITH AN EXCLAMATION MARK INCREASE SENTENCE 

TYPE %3< BY ONE. ^ — 

TF^HIIFT'X T?L'A'Sf< . £Q . E'XC LA M.OR.HLFTXTi:LAST<.EQ.AL3EXC<SFNTYP«3<n 
IF SENTENCE FMn _^L^ A .ODfL T I ON MARK INCREASE SENTE N C £_ TTPL— 



IF1?HLFTXT^LAST<.EQ.QUEST .or.hlftxt^last<.eq.alsqus<sentyp%a<#_^ 



r c 



RETURN TO 
RETURN 



MAIN PROGRAM, 



C 

100 _ 

c"- 



TEST TO DETERMINE IF bECLARATIVE SENTENCE IS A PARTICULAR TYPE. 

I F f INTABL^H LFTXT tP EC LA B 1 20 << IQ, LO? nw p 

IF 100 FALSE THEN INCREASE DECLARATIVE TYPE A SENTENCE BY ONE. 



SENTYP^8K#1 



I 



C 

l(n 

c 



RETURN TO MAIN PROGRAM. 
RETURJN_ _ 

IF 100 TRUE 
SENTYP«2<#1 



INCREASE OEClARAri^^^^^ ■TYPE B SENTENCE BY ONE. 



RETURN TO 
RETURN 



MAIN PROGRAM. 



END 

SIBFTC CHK 



r 



SUBROUTINE CHKLST fWORDfXX< 
COMMON/IN/BROKUPf TEXT f LENGTH 



j- 



REALBR0KUP%80< 

DOUBLE PRECISION TEXT%1Q0< 

COMMON/LISTS/SUBCONf PREP fRELPROf CONN EC fOALEfSPELLXfOECLABfSVOPN 

OOUB L E PRECISION S UBCONX20 < t PR E PX„5 0 < f REL PR0?10< , CONNEC «3 0< t D AL E 

«3000< , SPE"LL XX2'0b”0< , DECX'ABXl d< f S VO^PN 
CDMMQN/CHAR/PUNCT.ALPHA 



PEG 

PEG 



,l3000<» SPELL A 42 UUUS f UtULAD»luv.f 0 ¥ur)>i* 

COMMON /CHAR/ PUNCT, ALPHA 

REAL PUNCT*20<fBLANKfSTARf DECPTfCOMMAf HYPHENf APOSTRf OPARENf CPAREN 
SLASH f PE RIOD fCOLONfSEMICf EXCLAM f QUEST fPASHf ITAtlCf ALPHAX26<t A»8|C 
uTyTvTt 



1 SLASH f PERIOD fCOLONfSEMICfEXCLAMyQUES 
2DfEfFfGfHf I f JfKfLfMfNfOfP tQ^RtSyTtUf 



VfWfXfYfZ 



■er|c 
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50TE < f %T0 T%2Q <t TEXC LA< t *T 0TaS 2KfTQU ES<f ^TC 

6NN<f *T0T%2V<fTSPELL<f *tbtl25<fTRELPR<f *TC 

7ALE<f *TOT %2 8< t TENOy'< t ?T0T«29<tTn 

’ EQU I VALENCE ~ *S'UMSX3 3 < t NH YPH< t Itot %3 3 < ? T HYP H< 
EQUIVALENCE 3SSUMS%34<f NSLASH<y tT0T1S34<f T$LA$H< 



INTEGER 

INTEGER 



NSLASHfTSLASH 
NHYPHfTHYPH 






n 



EQUIVALE NCE " ^NUNDER , N I T AL< 

II^EGER LENGTH «100< 

REAL 



PEG 



EQUIVALENCE %RDTBLa^l<f SUBCON<f 3?RDTBL^4Kf P REP<f ^RDTBL%l^l<f R£LPRO<PEG 1 

l,XRDTBL*16KfC0NNEC<f «R0TBLX22Kf0ALE<f %RDTBL«622Kf SPELLX<f*ROTBLPEG I 

2*10221<f0ECLAB<,%RDTBL«1024KfSV0PN< 1 



f: 



REAL HLFTXtX200< 
EQUIVALENCE IXTf TEXT< 



COMMON/LOG/S ENtNbf ES S END 
logical SENTNDtESSEND 

EQUIVALENCEXPUNCT«17<fQUOTE<»«PUNCT«l8<fPtKCi<. 






REAL QUOTE, PERCT 



rOMMON/PSUM/ RELNlfNEXT 
INTEGER RE LN%2Q<,NEXT 
CONTAINS 



RELN Xl< CONTAINS THESUMS 
C0HM0N/LIST2/ SWORD 



SUBSCRIPT CORRESPONbfNG TO PUNCTXK 



double PRECISION SW0R0%10< 



r er|c 
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APPENDIX A (Continued) 

REAL $WRDt20< 

EQUIVALENCE «SWORDtSWRD< 

C . . - 

C THIS PART classifies WORDS OF THE SENTENCE AS PR’^POSITIQNr 
C RELATIVE P RONjlUNS |_SUBqRpjNATI^^^ CONJUNCTIONS CONNECTIVESf 

and/or ONE OF THE 3000 C0MM6"N WORDS ON THE DALE LIST 



C TYPING WORD AS 00U8LE~PRECISI0N» I.E. IT CAN CONTAIN FROM 1~12 

C CHARACTERS. 

DOUBLE P'RECISiON WORD 

C TYPING XX AND YY AS LOGICA L VAR IABLES. 

logical XX, YY 

C TYPING THE F UNCTION INTABL AS LOGICAL. 

LOGICAL INTABL 

9. VAL UE OF X X. 

Yy#xx " 

C REPLACING XX BY THE LOGICA L CONSTA N T TRUE . 

^j^#7TRUE. 

C TEST TO DETERMINE IF THE WORD IS A PREPOSITION. 

IF*INTABL*W0RD,PREP, 100<< (io To iOD 
C YY WILL BE TRUE FOR FIRST AND SECOND WOR D OF THE SEN TENCE. 

C 'AVOi^"'CHECkTN^ AS BEING RELATIVE" PRONOUNS .' 

IF%YY< GO TO 6 

Ye St to" OETERM'INE’ IF THE WORD IS A RELATIVE PRONOUN. 

fFasiNTA8LISW0RDtRELPRQ,2 Q« GO TO 200 __ 

C TEST TO DETERMINE IF THE WORD IS A SUBORDINATE CONJUNCTION. 

6 I F «I NT A B L J|WO RD , S UBC ON ♦ 4 0« GO TO 300 

C TEST TCTDEtERMrNE IF THE WORD IS IN THE LIST OF DALE WORDS. 

IF%INTABL%W0RD,0ALE,6000« GO TO 400 

C test TO DETE'RMINE IP 'tHE WORD IS A CONNECTIVE. 

IF^INTABLtWORDfCONNECf 6Q« GO TO 500 

C ASSIGN THE LOGICAL CONSTANT FALSE TO XX. 

XX#. FALSE. _ 

C RETURN TO CALLING PROGRAM. 

RETURN 

C INCREMENT NUMBER OF PREPOSITIONS BY ONE. 

100 NPREP#NPREP£1 ^ 

C ASSIGNED GO TO FOR ANOTHER INCREMENT. 

GO TO 400 _ 

C INCREMENT NUMBER OF RELATIVE PRONOUNS BY ONE. 

200 NRELPR#NRELPRC1 

"C ASSIGNED GO tO FOR ANOTHER INCREMENT. 

GO TO 400 

C INCREMENT NUMBER OF SUBORDINATE CONJUNCTIONS BY ONE. 

300 NSC0NJ#NSC0NJ61 _ _ 

C INCREMENT THE NU'M'BER OF DALE WORDS BY ONE. 

400 N0ALE#NDALE£;i „ 

C TEST TO DETERMINE IF THE WORD IS A CONNECTIVE. 

IF^INTABL%WORD,CONNEC,60« go TO 500 

C RETURN TO CALLING PROGRAM. 

RETURN 

c" Increment number of (Connectives by one. 

500 NCONN#NCONNCI _ _ . 

c’ return to CALLING PROGRAM. 

RETURN 

END 




APPENDIX A (Continued) 



ERIC 



M/lliiT!!lig!lff7igi.U 



4CT<f *SUMS%15<fNUNDER<f%SUMSJ?^6<fNDASH<f *SUMS^l7<fNC0L^<f%SUHS?18 PEG 

5<,NSTMir<r*SUMSn9<,NQU0fE<7rS0MSlg20<,NEXCLA<,%SUMS221<^ <,«SPEG 

anmc;i??^.NPRFP<.^SUMS^ 23<.NCGNN<,%SUMS 3524<>NSPELL<>%SUMS^25<>NREL PE^j_ 
7PR<,%SUMS?26<,NSC0NJ<,^SUMS%27<,NDALE<,%SUMS%28<,EN0PCT<,«SUMS?29 PEG I 

8<,SENTYP< .. I 

EQUIVALENCE%TOTXl<»TIO<f ?T0T*2<fTTITLE<f *TOT*3<f NUMSEN<f *TOT«4<f N PEG I 
lUMPAR<f *TOT%5<fSVOPEN<f *T0T*6<fT0TLET<f ?TOT*7<f TSQLET<fSIT0T?3<f TOTPEG I 
2WDS<,*T0ti9<,TWbSQ<,%T0T%10<,TPAREN<,?f0T%ll<,TAP PPG I 

30S<,*T0T%12<fTC0MMA<,*T0T*13<tTPER<f *T0T*14<,TPE^CT<^^T0T*l5<j TJU|NpPEG_l 
4ER<,XTOT«6<,TDA'^,*TOTX17<,TCOLON<f*TOT?18<,TSEMIC<,*TOT%19<,TQUPEG I 
50TE<»?T0T*20<,TEXCLA<,«T0T«2KfTQUES<f%T0T%22<,TPREP<,»T0T*23<,TC0PEG I 
6NN<f*fbTt24<,TSPELL<,*TbT1525<fTRELPR<f5<TOT*26<,TSCONJ<,*TOT«27<,TDPEG 

7ALE<,*TOT %28<,TENDPT<,»TOT%29<fTTYPE< PEG 

EQUiVALENCE %SUMSX33<f NHYPH<t %tbt*33<t THYPH< 

EQ UIVALENCE ^SUM$^34< f NSLASH<t %T0T%34<f T$LA$H < 

INTEGER NSLASHfTSLASH 
TNTFGFR NHYPH.THYPH 



INTc&cK NoL Abn t l i LA on 
INTE^R _NHYPH,THYPH 
EbUI VALENCE "%NUNDER t NI T AL< 
INTEGER LENGTH tlOO< 

Q E a 1 ■ R nf rF* Y 0540 < 



REAL RDTSFISI 0540^ rcb 

EQUIVALENCE ^RDTBL^l<>$UBC0N<>XRDTBL^4K>PREP<>%RDTB L ^141<f R£LPRg <PE_G 

1,*R0T8L%161< ,C0NNEC<f fRDTBL%221<fDALE<t 3IRDTBL«6221<f SPELLX<f «RDTBLPEG 
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APPENDIX A (Continued) 



2X1022l<f DECLAB<f < R DTBL%10 2 41<f SVQPN< 
REAL HLFTXT?200< 

EQUIVALENCE ^HLFTXT ,TEXT< 
COMMON/LOG/SENTND,eSSEND 
LOGICAL SENTNDf E5SEND 



PEG I 



EQUIVALENCE^PUNCT%lT<,QUOTE<,%PUNCT%18<fPERCT< 
REAL QUOTEfPERCT 



COMMON/PSUM/ RELNfNEXT 
INTEGER RELN?20<,NEXT 

RELN"fl< CONTAINS THESUMS SUBSCRIPT CORRESPONDING TO PUNCT^K 

COMMON/LIST2/ SWORD _ 

DOUBVe "pREC I SI ON S WORD%lO< 

REAL SWRD%20< 



EQUIVALENCE 55SWORD, SWRD< 
REAL ENDING fWRDEND«2< 



THIS PART CLASSIFIES THE SENTENCE ACCORDING TO WHTHER THE OPENING 

IS ’A SUBJECT VERB TYPE OPENING OR NOT 

CONSIST OF CHECKING FOR CERTAIN WORDS ? ELIMINATING CERTAIN W( 
TESTING fHT~W OR 0 "ENDING OF THE FIRST WORD OF THE SENTENCE 



AND 
FOR S 



TYPING THE ARRAY LETTER AS REA L* 

REAL " LETTER«12< 

TYPING THE FUNCTION INTABL AS LOGICAL* 



LOGICAL INTABL 

INITIALIZE SUBJECT-VERB VARIABLE. 



SUBVER#0 

TEST TO DETERMINE 
PROGRAM* 



IF FIRST WORD ENDS IN A Sf IF SO, RETURN TO 



CALLING 
IF*INTABL*TEXT, SWORD, 20« 



RETURN 



REPLACE ICT 
ICT#LENGTHT1< 

Tl 



BY LENGTH^K* 



BY ICT MINUS FIVE. 



REPLACE 

_LL#I_C_I_-5 .. . 

UNPACK WORD. 

CALL UNPACK^TEXT^K, LETT ER, 12< 

ENDING 






'A 



DESIGNATED BY LETTER. 



JHIS 
CALL 
tESf 
IF % 



EQUAL TO ' AT ION. 



PACKS WRDENO WITH WORD 
CALL PACK *LETTER^LL<,WRDEND,6< 

CALL STATEMENT SETS ENDING 
DATA LENDING, 6HATI0N < . . . 

‘to DE"tERMIf^E ‘ if’ WRDEND IS SAME AS AT ION. 
ENDING .EQ. WRDEND < GO TO 3 



I 



THIS 
CALL 
■'TEST 
IF % 
■tHl'S 
CALL 



CALL STATEMENT SETS ENDING EQUAL TO OLOGY. 
DATA *JiNDING,6H0L0GY < 

TO ‘DETFRMiNE IF WRDEN‘D IS SAME AS OLOGY. 
ENDING .EQ. WRDEND < GO TO 3 

CALL STATEMENT SETS ENDING EQUAL TO SHIP. 
DATA LENDING, 6HSHIP < 



TEST TO DETERMINE IF WRDEND IS SAME AS SHIP. 

IF % ENDING .EQ. WRDEND < GO TO 3 

THfS' CTLL" STATEMENT SETS ENDING EQUAL tO MENT . 
CALL DATA LENDING, 6HMENT < 

‘fEST‘ TO DETERMINE' IF WRDEND IS SAME AS MENT. 

IF % ENDING .EQ. WRDEND < GO TO 3 



TEST TO DETERMINE IF WORD BEING ANALYZED IS IN S-V LIST. 
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APPENDIX A (Continued) 



IF%INTABL*TEXTXK,SV0PNf300« GO TO 3 
C UNPACK LETTER. 

CALL_UNPACK *TEXT*K, LETTER, 12< 

C IF LAST LETTER NOT EQUAL TO S RETURN TO CALLING PROGRAM. 

IF?LFJTER1!ICT<.NE.S< RETURN 

IF WORD is IN S-V LIST, HAS ONE OF THE ABOVE ENDINGS, OR ENOS 
WITH S, THEN CHANGE VARIABLE SVBVER FROM 0 TO 1 INDICATING 
A S-V OPENING. 

SUBVERJ) 

RETURN TO CALLING PROGRAM. 

RETURN 
END 



-223- 




o'lOJ o o ivi o: o o ! oo o»-oo o o o oorw 



APPENDIX A (Continued) 



$IBF TC TLU 

LOGICAL FUNCTION I NTABLXAt Bt M< 

COMPLEX Af8S8192< 

rNTEGERN 

C LOGICAL COMPARISON SUBROUT INJ^S • 

LOGICAL "EQAfGtA,LtA 



BINARY SEARCH — A IS THE ARGUMENT tB THE TABLE tN THE TABLE LEN GTH . 



"REPLACE the VALUE OF N BY fHF VALUE IN M DIVIDED BY TWO. 

N#M/2 . 

REPLACE INTABL BY THE LOGICAL CONSTANT FALSE. 

INTABL#. FALSE. 

REPLACE THE VARIABLE J BY THE CONSTANT ^096. 



J#A096 _ 

REPLACE THE VALUE OF K BY THE VALUE. OF J. 



K#J „J. — . 

TESf TO DETERMINE IF J EQUAL tO ZERO. IF SOt RETURN TO 

CALLING PROGRAM. ^ 

IFXJ.EQ.O< RETURN 
rEPI^CE the VALUE OF J BY THE VALUE OF J 01 VI DEO BY TWO. 

j# J/2 

REPLACE L BY THE MINIMUM VALUE OF THE TWO ARGUM ENT S t 

■'k““”or“n.“ 

L»MIN01gKfN< 

IF*LTAXREAL%A<fREAL*BtL«« GO TO 3 
IF *GT AUREAL* A<»REALtBXL«« GO TO 2 
' IFiLTAj!'AIMAGlA<fAIMA GO TO 3 

IF*GTAtMMAG*A<f AIMAGW?L«« GO TO 2 
“replace INTABL BY THE LOGICiAL CONSTANT TRUE. 

INTABL».TRUE. 

RETURN TO CALLING PROGRAM. 



RET URN 

REPLACE k BY THE VALUE IN 
K#K£ J 

ASSfGNEb GO TO FOR A TEST. 



K PLUS THE VALUE IN J. 



GO TO 1 - — 

REPLACE K BY THE VALUE IN K MINUS THE VAlUE IN J. 

K#K^ J 

ASSfGN OT to FOR A tEST. 



GO TO I 
END'" 

SISFTC SUBUPC 

‘ SUBROUTINE U NP AC KXA t B t N< 



APPENDIX A (Continued) 



REAL I.SHIFT 

DIMENSION At4<f8«100< 

00 JL I Jli-N ■ - 

J# !II-1</661 

K#M0DtI-1.6< - — 

1 B%K#LSHIFt%A«J<,K< 

RETURN 

END 

$IBMAP LSHFT 

' "ENTRY " LSHIFT 

LSHIFT S AVE 4 

CLA* 

ADD* 

TfD TEMP 

ADD TEMP 

ADD" TE’MP ■ 

STA S 

" ' CAL* 3t4 

S ALS ** 

ATJA #07 70000000000 

ORA #06060606060 

IlT " TEMP 

CLA temp : 

RETURN "LSHTFT 

TEMP 6$$ 1 

END 

SIBFTC SUBPAC 

SUBROUTINE PAC"K%AfB»N< 

C TYPING RSHIFT AS RE AL* 

REAL RSHIFT 

C DIMENSIONING THE ARRAYS A AND B« 

DIMENSION A*100<»BX4< ^ 

C N DETER MI NE D jJ^_ARgUME NT VALUE IN CALL STATEMENT ♦ 

M#XXNClK/iF<*”l2 



DO 1 I#lfM 



C 

C 



MODULO^FUNCT ION WHEREIN INTERESTED IN REMAINDER OF I-l DIVIDED^ 
BY 6. 

K#M 0PtI~lf6< 

IFXk*EQ.O< BXJ<#0. 

D#A*i< 

IFil.GT.N< D#-.5 

BXJ<#ORXBXJ<fRSHlFTXDfK« 

RETURN 

END - •• ' 



SIBMAP 



RSHFT 

ENTRY 



RSHIFT 




CAL* 
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APPENDIX A (Continued) 



ANA 



#0770000000000 



LGR 
SL^ 
CL A 
LOQ 



TEMP 



RETURN 

BSS 



TEMP 
TEMP ‘ 
HOLD 
R SHIFT 
I 



HOLD 



BSS 

END 



■ SIBFTC DATSIN 

SUBROUTINE DATA*AfB< 

"C rVPING THE ARGU'MIENTS A AND 

REAL A»B 

m 

-Q TtE TURNn^XSriTNG" PITOGRAff. 

RETURN 

END 



B“ AS REAf. 
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APPENDIX A (Continued) 



SIBMAP LCHP 



* 

it 

Jc" 



logical comparison subroutines 



I 





ENTRY 


GTA 


G'tA 


SAVE 

CAL* 


-- 4 - 

3,4 




LAS* 


4,4 




TRA 


_RAl 




■ NO'P 






ZAC 






RETURN 


GfA 


_RA1 


CLS 


#0 




RETURN 


GTA 




ENTRY 


GEA 


GEA" ■ 


S^AVE 


-4 




CAL* 


3,4 




LAS* 


■ 4,4 




TRA 


RA2 



RA2 



TRA 

ZAC 

RFTURN 

CL_S 

■return 

ENTRY 



RA2 



GEA 

£0 

GEA 

LTA 



LTA 



SAVE 


4 


CAL* 


4,4 


LAS* 


3,4 


TRA 


RA3 


NOP 




ZAC 







RETURN 


LTA 


RA3 


CLS 


«0 




■" RETURN 


LTA 




ENTRY 


LEA 


LEA 


SAVE 


"4 




CAL* 


4,4 



LAS* 




3,4 
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— 


TRA 


RAA 




TRA 

ZAC 


RA4 


_ RA_4_ 


RETURN 

CIS 

RETURN 

ENTRY 


LEA p 

LEA 1 

EQA — 


EQA 


SAVE 

CAL* 


A 1 

3f4 1 




LAS* 

TRA 


*S2 - - -“1 




TRA 

ZAC 


RA5 1 


RA6 


RETURN 

CLS 


"TcJa I 

i^O __ -1 


lA M J 


RETURN 

ENTRY 


EQA 

NEA - 1 


■~NE.A 


SAVE 

CAL* 


^ 1 

3t4 1 




LA$* 

TRA 


1 

*£2 ^ 


— • - 


"TRA 

CLS 

RETURN 

ZAC 


RA'6 

*0 1 


RAfi 


NEA 1 




RETURN 

END 


NEA 


ViBMAP 


SETFP 

ENTRY 


FPTRAP 


FPTRAP 


SAVE 

AXT 


-ItA 




SXA 

RETURN 


SETFP. Sl4f4 

FPTRAP ^ 




EXTERN 

END 


SETFP. 


SENTRY 




PEG I 
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APPENDIX A (Continued) 



AFTER 

SINCE 

WHEN 



ALTHOUGH 

THAN 

WHERE 



AS 

THOUGH 

WHEREVER 



ZZZZZZZZZZZZZZZZZZZZZZZZABOUT 



AMID 



AMONG 



BECAUSE 

UNLESS 

WHILE 

ABOVE" 

AROUND 



BEFORE IF 

■UNTrL " WHENEVER "" 

ZZZZZZZZZZZZZZZZZZZZZZj^Z 
ACROSS " ’ A"GAINSt 

at BEHIND 



BELOW 


BENEATH 


BESIDE 
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KEEN 
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KEPT 
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KINGDOM 
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LOCOMOTIVE 
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MADE 
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MAJOR ^ 


MAKE 


ma"king 


MALE 


MAMA 


MAMMA 


MANAGER | 


MANE 


MANGER 


MAN 


MANY 


MAPLE 


MAP J 


MARBLE 


MARCH 


MARE 


MARKET 


MARK 


MARRIAGE j 


MARRIED 


MARRY 


MA 


MASK 


MASTER 


MAST ^ 


MATCH 


MAT 


MATTER 


MATTRESS 


MAYBE 


MAYOR 1 

, J? 


MAYPOLE 


MAY 


MEADOW 


MEAL 


MEAN 


MEANS 1 


MEANT 


MEASURE 


MEAT 


MEDICINE 


MEETING 


MEET 1 


MELT 


MEMBER 


MEND 


MEN 


MEOW 


MERRY 1 


ME 


MESSAGE 


MESS 


METAL 


MET 


MEW 1 


MICE 


MIDDLE 


MIDNIGHT 


MIGHT 


MIGHTY 


MILE 1 



MILKMAN 

<^iner_ 

MISSPELL 

MIX 

MOONLIGHT 

MORNING 



MILK 

MINE 

Miss 

MOMENT 



MINT 



MOON 

MORROW 



MISTAKE 

MONDAY 

MOO 

MOSS 



MINUTE 

MISTY 

M ONEY 

MOOSE 

MOSTLY 



MIRROR 



MITTEN 

MONKEY 

"MOP" 

MOST 



MOTOR 

MOVIE 



MUCH 

MURDER 



NAME 



MOUNTA IN 
M OVIES 
MUDDY 
MUSIC 

napkin" 



MOUNT 
MOVI NG 
MUD 
MUST 



NAP 



MOUSE 
MOW_ 
MUG ' 

MY 

NARROW" 



MOUTH 

MR._ 

MULE 

MYSELF 

NASTY" 



NAVY 


NEARBY 


NEARLY 


NEAR 


NEAT 


NECKTIE 


NEEDLE 


NEEDN T 


NEED 


NEGRO 


NEIGHBOR 


NEITHER 


NERVE 


NEST 


NET 


NEVER 


NEW 


NEWSPAP'ER 


NE"WS' 


NEXT 


NICE 


NICKEL 


NIGHTGOWN 


NIGHT 


NINE 


NINETY 


NOBODY 


NOD 


NOISE 


NOISY 


NOON 


NOR 


NORTHERN 


NORTH 


NO 



MISC_HIEF 

mTt't 

MOJ4TH _ 

MORE 

MOTHER 

MOVE 

MRS. 

MULTIPLY' 

NAIL 

NAUGHTY 

NECK 



neighborhood 

NEVERMORE 

NIBBLE 

NINETEEN 

NONE 

NOSE 

NOWHERE 



NOTE 



NOTHING 



NOTICE 



NOT 
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4 

4 

\ 


NOW 


NUMBER 


NURSE 


NUT 


OAK 


OAR 1 


OATMEAL 


OATS 


OBEY 


OCEAN 


OCLOCK 


OCTOBER 1 


ODD 


OFFER 


OFFICER 


OFFICE 


OFF 


OF 1 


OFTEN 


OH 


OIL 


□LD-FASHIONEOLD 


ONCE 1 


ONE 


ONION 


ONLY 


ON 


ONWARD 


OPEN 1 


ORANGE 


ORCHARD 


ORDER 


ORE 


ORGAN 


OR 1 


OTHER 


OTHERWISE 


OUCH 


OUGHT 


OUR 


OURSELVES I 


OURS 


OUTDOORS 


OUTFIT 


OUTLAW 


OUTLINE 


OUT 1 


OUTSIDE 


OUTWARD 


OVEN 


OVERALLS 


OVERCOAT 


OVEREAT I 


OVERHEAD 


OVERHEAR 


OVERNIGHT 


OVER 


OVERTURN ■ 


OWE \ 


OWING 


OWL 


OWNER 


OWN 


OX 


PACE 1 


PACKAGE 


PACK 


PAD 


PAGE 


■p'Aib 


PAIL ji 


PAINFUL 


PAIN 


PAINTER 


PAINTING 


PAINT 


PAIR 1 


PALACE 


PALE 


PAL 


P AMERICA 


PANCAKE 


PANE 1 


PAN 


PANSY 


PANTS 


PAPA 


PAPER 


PARADE 1 


“ PARDON ' 


PARENT 


PARK 


PARTLY 


“ ■ PA 'R T'N E R 


~ PART 1 


PARTY 


PA 


PASSENGER 


PASS 


PASTE 




"PASTURE' 


PATCH" 


PATH 


PAT 


patter 


PAVEMENT I 


PAVE 


PAW 


PAYMENT 


PAY 


PEACEFUL 


PEACE 1 


PEACHES 


PEACH 


PEAK 


PEANUT 


PEARL 


PEAR 1 


PEA 


PEAS 


PECK 


PEEK 


PEEL 


PEEP i 


‘PEG 


PENCIL' 


PENNY 


PEN 


PEOPLE 


PEPPERMINT ' 1 


PEPPER 


PERFUME 


PERHAPS 


PERSON 


PET 


PHONE 1 


PIANO ■' 


- ---pjcKLE 


PI'CK 


PICNIC 


PICTURE 


"PIECE i 


PIE 


PIGEON 


PIGGY 


PIG 


PILE 


PILLOW 


PILL 


PINEAPPLE 


PINE 


PINK 


PIN 


PINT 


PIPE 


PISTOL 


PITCHER 


PITCH 


PIT 


PITY 


PLACE ■ 


PLAIN 


PLANE 


■ -p-QN ■ 


PLANT 


"PLATE' 


PLATFORM 


PLATTER 


PLAYER 


PLAYGROUND PLAYHOUSE 


PLAYMATE 


PLAY 


PLAYtHING 


PLEASANT 


PLEASE 


PLEASURE 


PLENTY 


PLOW 


PLUG 


PLUM 


POCKETBOOK POCKET 


POEM 



POINT 

POLISH 

POOR 

POSSIBLE 

pdt' ' ■ 
PRAISE 



POISON 

POLITE 

POPCORN 

POSTAGE 



POUND 

PRAYER 



POKE 
POND 
POPPED 
POSTMAN 
POUR ■ 
PRAY 



PONIES 



PONY 



POP 

POST 



PORCH 

POTATOES 



POWDER 

PREPARE 



POWERFUL 

PRESENT 



POOL 
PORK " 
PO TATO 
POWER “ 
PRETTY 



PRICE 


PRICK 


PRINCE 


PRINCESS 


PRINT 


PRIZE 


PROMISE 


PROPER 


PROTECT 


PROUD 


PRUNE 


PUBLIC 


PUbbLE 


""PUFF" 


PULL 


PUMP 


PUNCH 


PUNISH 


PUPIL 


PUPPY 


Fure 


"purpCF 


PURSE 


PUSH 


PUSS ' 


PUSSY 


PUT 


PUTTING 


PUZZLE 


QUACK 



PRISON 

PROVE 

PUMPKIN 

PUP 

PUSSYCAT 
QUARTER 



QUART 

quiet 

RACK 

RA ILW AY 

RAKE 

RAP 



QUEEN 

QUILT 

RADIO 

RAJNBOW 

■ram'" 

RATE 



QUEER 

QU!TE_ 

MbiSH 

RAIN 

"RANCH 

RATHER 



QUESTION 

QUIT 

RAG' 

RAINY^ 

RANG 

RAT 



QUICKLY 
RABBIT 
RA ILROAD 
RAISE 
RAN 

rattle 



QUICK 

RACE 

RAIL 

RAISIN 

RAPIDLY 

RAW 

READY 

REBUILD 

RED 

REMIND 

REPORT 

RIB 

RIDING 



RAY 

REALLY 

■receive 

REFUSE 

REMOVE 

REST 



RICE 



REACH 

REAL 

RECESS ' 
REINDEER 

■rent" 

RETURN 

RICH 



READER 

REAP 

RECORD ' 

REJOICE 

REPAIR 

REVIEW 



READING 

REAR 

'REDB'iRD 

REMAIN 

REPAY 

REWARD 



READ 

REASON 

REDBREAST 

REMEMBER 

REPEAT 

RIBBON 



RIDDLE 



RIDER 



RIDE 
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RIGHT 



RIM 



RING 



RIPE 



RIP 



RISE 

ROAST 

ROCK 

ROOM 

ROT 



RISING 

ROBB ER 

ROCKY 

ROOSTER 

ROTTEN 

ROYAL 



RIVER 

ROBE 

RODE 

ROOT 

ROUGH 

RUBBED 



ROAD 

ROBIN 

ROLLER 

ROPE 

ROUND 

RUBBER 



ROADSIDE 

ROB 

roLl 

^SEBUD 

ROUTE 

RUBBISH 



ROAR 

ROCKET 

ROOF 

ROSE 

ROWBOAT 

RUB 



RUG 

RUj^NING 

SACK 

SAID 

SALE 

SANG 



RULER 

RUN 



RULE 

RUSH 



SADDLE 

SAILBOAT 



SADNESS 

SAILOR 



RUMBLE 

RUST 

$A"b 

SAIL 



RUNG 

RUSTY 



SAFE 

SAINT 



SALT 

SANK 



SAME 

SAP 



SAND 

SASH 



SANDWICH 

SATIN 



RYE 

SAFETY' 

SALAD 

sandy' 

SATISFACTORY 



SAT 

SAW 

SCHOOL 30'Y 
SCORE 

scr'ew ■' 

SEASON 



SATURDAY 

SAY 



SAUSAGE 

SCAB 



SAVAGE 

SCALES 



SAVE 

SCARE 



SCHOOL HOUSE 

SCRAPE_ 

SCRUB 

SEAT 



■^HOOLMASTERSCHOOLROOM SCHOOL 
SCRAP SCRATCH SCREAM 



SEAL 

SECOND 



SEAM 

SECRET 



SEARCH 

SEED 



SAVINGS 
SCARF 
SCORCH” 
^REEN 
SEA ■ 
SEEING 



SEEK 

SELFISH 

sent 

^ET _ _ 
SEVENTH 
SHADY 



SEEM 

SELF 



SEEN 

SELL 



SEE 

SEND 



SENSE 



select 

SENTENCE 



SEPARATE 

SETTING 

seventy" 

SHAKER 



SEPTEMBER 

SETTLEMENT 



SERVANT 

SETTLE 



SERVE 

SEVEN 



SERVICE 

SEVENTEEN 



SEVERAL 

SHAKE 



SEW 

SHAKING 



SHADE 

SHALL 



SHADOW 

SHAME 



SHAN T 
SHE LL 



SHAPE 
SHE S 



SHARE 

SHEAR 



SHARP 

SHEARS 



SHAVE 

SHED 



SHE D 
SHEEP 



SHEET 

SHINING 



SHELF 

SHINY 



SHELL 

SHIP 



SHEPHERD 

SHIRT 



SHE 

SHOCK 



SHINE 

SHOEMAKER 



SHOE 

SHORE 



SHONE 

SHORT 



SHOOK 

SHOT 



SHOOT 

SHOULDER 



SHOUT 

SICKNESS 

Tight 

SILl^Y 

SING ' 
SISSY 



SHOVEL 

SICK 



SIGN 

SILVER 



SINK 

SISTER 



SHOWER 

SIDE 

SILENCE 

SIMPLE 

sTn 

SIT 



SHOW 

SIDEWALK 



SHOPPING 
SHOULDN T 
shut 

SIDEWAYS 



SHOP 

SHOULD 



SHY 

SIGH 



SILENT 

SINCE 



SILK 

SINGER 



SILL 

SINGLE 



SIP 

SITTING 



SIS 

SIXTEEN 



SIXTH 

SKIP 

SLATE 

SLEIGH 

SLIPPED 



SIXTY 

SLAVE 

SLEPT 

SLIPPER 



SIZE 

SKI 

SLED 

SLICE 



SKATER 

SKY 



SKATE 

SLAM 



SKIN 

SLAP 



SLEEP 

SLIDE 



SLIPPERY 

SMACK 



SLIP 

SMALL 



SLEEPY 

SLID 

SCiT 

SMART 



SLEEVE 

SLING_ 

SLOWLY 

SMELL 



I 



SMILE 

SNAP 



SNUFF 

SODA 



SMOKE 

SNEEZE 

SNUG 

SOD 



SOLD 

SOMETHING 



SOLE 

SOMETIME 



SMOOTH 
SNOWBALL _ 

soak" 

S OFA 

SOMEBODY 

SOMETIMES 



SNAIL 

SNOWFLAKE 

SOAP' 

SOFT_ 

SOMEHOW 

SOMEWHERE 



SNAKE 

SNO^W 

SOB"' 

SOIL 



SOMEONE 

SONG 



SNAPPIN( 

SNOWY 

SOCKS 

SOLDIER 

SOME ' ' 

SON 



SOON 

SOUL 



SORE 

SOUND 



SPACE 

SPEAR 



SPADE 

SPEECH 



SPENT 

SPIRIT 

SPOON 



SPIDER 

SPIT 

SPORT 



SORROW 

squp_ 

SPANK 
S2.EED 
SPIKE 
SPLAS H 
SPOT 



SORRY 

SOUR 



SORT 

SOUTHERN 



SPARROW 
SPELL_ING_ 

spIll" 

SPOIL 
SPREAD 
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SPEAKER 
SPELL 
'SPINACH 
SPOKE 
SPRING 



SO 

SOUTH 
"SPE'AK 
SPEND 
S>IN " 

SPOOK 
SPRINGTIME 
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SPRINKLE 



SQUARE 



SQUASH 



SQUEAK 



SQUEEZE 



SQUIRREL 



STABLE 
STAND 
STATION ■ 
STEAM 
STEPPING 

still 



STACK 
STARE 
STAY ■ 
^TEEL 
STEP ‘ 
STING 



STAGE 

STAR 

STEAK 

STEEPLE 

STICK 

STIR 



STAIR 

START 

STEAL 

STEEP 

STICKY 

STITCH 



STOLE 

STOPPING 

STORMY 

STRAP 

STRING ■■■ 

STUFF 



STONE 

STOP 

STORY 

STRAWBERRY 



STRIPES 

STUMP 



STOOD 

STORE 

STOVE 

STRAW 

STRIP 

STUNG 



SUDDEN 

SUNDAY 

SUNRISE 

SURELY 

SWAM 

SWEEP 



SUFFER 
Sl^F LOWER _ 

SUN 

SURE 

'SWAN 

SWEETHEART 



TUGAR 

SUNG 

SUNSET 

SURFACE 

SWAT 

SWEETNESS 



STOOL 

STORIES 

STRAIGHT' 

STREAM 

STRONG 

SUBJECT 

TUTT 

SUNI^ 

SUN SHINE 
^URPfUSE 
SWEAR 
SWEET 



STALL 
STARVE 
STEAMBOAT 
STEER 
STIFF 
STOCKING 
STOOP 
STORK 
STRANGER 
STREET 
STUCK 
SUCH 
SUMMER 
SUNLIGHT 
SUPPER 
SWALLOW 
"SWEATER ■■ 
SWELL 



STAMP 

STATE 

STEAMER 

STEM 

STILLNESS 

STOCK 



STOPPED 

STORM 

STRANGE 

stretch 

STUDY 

SUCK 



SWIFT 

SWORE 

TAG ■■ 

TALE 

TAN 

TASTE 



SWIMMING 

TABLECLOTH 

Ta i'LoR 

TALKER 

TAPE 

TAUGHT 



SWIM 

TABLE 

Tail 

TALK 

TAP 

TAX 



SWING 

TABLES^ON 

TAKEN 

TALL 

TARDY 

TEACHER 



; ^H 

TABLET 



TAKE 

TAME 



TAR 

TEACH 



TEAR 

TELL 

TERRIBLE 

THAN_ 

the'm 

THEY LL 



TEA 

TEMPER 
TEST 
THAT S 
THEN 
THEY RE 



TEASE 

TENNIS 

THANKFUL 

THAT 

THERE 

THEY VE 



TEASPOON 

TEN__ 

THANK 

THEATER_ 

THE 

THEY 



TEETH 

TENT 



“SUM" 

SUNNY 

SUPPOSE 

SWAMP 

SWEAT 

SWEPT 

SWORD 

TACK 

'TAKING 

LANK 

TASK 

TEAM 

TELEPHONE 

TERM 



THANKSGIVINGTHANKS 
THEE THEIR 

■ THEY d 



THESE 

THICK 



THIEF 



THIMBLE 

THIRTEEN 

THOUGH ■■ 

THROAT_ 

THUNDER 

TIE 



THING 

THIRTY 

THOUGHT 

THRONE 



THURSDAY 

TIGER 



THINK 

THIS 

"THdUSANb 

JHROUGH 

THY 

TIGHT 



THIN 

THORN 

THREAD 

THROWN 

flCKET 

TILL 



THIRD 

THO 



THREE 

THROW 



TICKLE 

TIME 



THIRSTY 

THOSE 

threw" 

TJ^fUMB _ 

TICK 

TINKLE 



TIN 

TITLE 

TOE 

TONE 

too" 

TORE 



TINY 

TOAD 

fOGETHER 

TONGUE 

tdOTHBRUSH 

TORN 



TIP 

TOADSTOOL 

tO'lLEf 

TONIGHT 

TOOTHPICK 

TO 



TIPTOE 

TOAST 

TOLD 

TON 

TOOTH' 

TOSS 



TIRED 

TOBACCO 

TOMATO 

TOOK 

TOOT 

TOUCH 



TIRE 

TODAY 

TOMORROW 

TOOL 

TOP 

TOWARD S 



TOWARD 
TRACE_ 
TRAY 
TRIED 
tRUE ' 



TOWEL 

TRACK 

TREASURE 

TRIM 

T'RULY 



TOWER 

TRADE 

TREAT 

TRIP 

TRUNK 



TOWN 

TRAIN 

TREE 

TROLLEY 

fRUSt 



TOW 
TRAMP 
TRICK 
TROUBLE 
tRUTH 
TUMBLE 



TOY 

TRAP 

TRICYCLE 

TRUCK 

TRY 

TUNE 



TUNNEL 


TURKEY 


TURN 


TURTLE 


TWELVE 


TWENTY 




TWICE 


TWIG 


TWIN 


TWO 


UGLY 


UMBRELLA 




uncl'e 


undTr 


UNDERSTAND 


UNDERWEAR 


UNDRESS " 


UNFAIR 




UNFINISHED 


UNFOLD 


UNFRIENDLY 


UNHAPPY 


UNHURT 


UNIFORM 




UNITE'D StATEUNKINO 


UNKNOWN 


UNLESS 


UNPLEASANT 


UNTIL 


jF 


UNWILLING 


UPON 


UPPER 


UP 


UPSET 


UPSIDE 


J 


UPSTAIRS 


UPTOWN 


UPWARD 


USED 


USEFUL 


USE 










238- 













- - 




APPENDIX A (Continued) 


s 

5 

1 

f 

1 

j 


US 


VALENTINE 


VALLEY 


VALUABLE 


VALUE 


VASE 1 


VEGETABLE 


VELVET 


VERY 


VESSEL 


VICTORY 


VIEW [} 


VILLAGE 


VINE 


VIOLET 


VISITOR 


VISIT 


VOICE || 


VOTE 


WAGON 


WA(^ 


WAiSt 


WAIT 


'WAKEN J 


WAKE 


WALK 


WALL 


WALNUT 


WANT 


WARM H 


WARN 


WAR 


“ WASHER 


WASH 


WASHTUB 


WASN T 


WAS 


WASTE 


WATCHMAN 


WATCH 


WATERMELON 


WATERPROOF 1 


WATER 


WAVE 


WAX 


WAY 


WAYSIDE 


WE D H 


WE LL 


WE RE 


WE VE 


WEAKEN 


WEAKNESS 


WEAK 1 


WEALTH ' 


WEAPON 


WEAR 


WEARY 


weather 


WEAVE I 


WEB 


WEDDING 


WEDNESDAY 


WEED 


WEEK 


WEEP 1 


WEE 


WEIGH 


WELCOME 


WELL 


WENT 


WERE i 

‘4 


WE 


WESTERN 


WEST 


WET 


WHALE 


WHAT S 1 


WHAT 


WHEAT 


WHEEL 


WHENEVER 


WHEN 


WHERE 1 


WHICH 


WHILE 


WHIPPED 


WHIP 


WHIRL 


WHISKY 1 

— — — - — - — — 


WHISPER 


WHISTLE 


WHITE 


WHO D 


WHO LL 


WHIT ^ :i 


WHOLE 


WHOM 


WHO 


WHOSE 


WHY 


WICKED I 


WIDE 


WIFE 


WIGGLE 


wildcat 


WILD 


WILLING 1 


WILLOW 


WILL 


WINDMILL 


WINDOW 


WIND 


WINDY 1 


WINE 


WING 


WINK 


WINNER 


WIN 


WINTER ^ 


WIPE 


WIRE 


WISE 


WISH 


WITCH 


WITHOUT j 


WITH 


WIT 


WOKE 


WOLF 


WOMAN 


WOMEN || 


WON T 


WONDERFUL 


WONDER 


WON < 


WOODEN 


WOODPECKER 1 


WOOD 


WOODS 


WOOLEN 


WOOL 


WORD 


WORE i 


WORKER 


WORKMAN 


WORK 


WORLD 


WORM 


WORN 1 


WORRY 


WORSE 


WORST . 


WORTH 


WOULDN T 


WOULD 1 


WOUND 


WOVE 


WRAPPED 


WRAP 


WRECK 


WREN 1 


WRING 


WRITE 


WRITING 


WRITTEN 


WRONG 


WROTE 


WRUNG 


YARD 


YARN 


YEAR 


YELLOW 


YELL 1 


YES 


YESTERDAY 


YET 


YOLK 


YONDER 


YOU D 1 


YOURSELF 


YOURSELVES 


YOURS 


YOU 


youth 


llllllllllll 1 



zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz 

777777ZZZZZZZZZZZZZZZZZZZZZZZZZZ ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ 

zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz 

77777777 777ZZZZZZZZZZZZZZZZZZZZZZ ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ Z 

zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz 

zzzz zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz 

zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz 



ZZZZZZZ2ZZZZZZZZZZ2ZZZZZABSENSE 


ABSOLUTLY 


ABUNDENCE 


ABUNDENT 


ACADAMY ACCELLERATORACCEPTENCE 

ACCOMOOATIONACCROSS ACEDEMIC 


ACCESSA8LE 

ACHEIVEMENT 


ACCIDENTLY 

ACHIEVMENT 


ACCOMODATE 

ACKNOWLEGE 


ACKNOWLEGING ACTUAL Y 
ACURATE ADEQUETE 


ACUMULATE 

ADMITTENCE 


ACUMULATING 

ADOLECENT 


ACURACY 
Ap_OLESE_NT _ 


ACURACY 

ApyANTAGOUS 


AOVERTISMENTAFAIR 
AGRAVATING AGRESIVE 


AGGRESSIVE 

AGRESSIVE 


AGGRlVAfE 

AGRIVATE 


ag(?r I vat i ng 

AGRIVATING 


agravaTe 1 

ALLEDGE 1 


ALLEDGING 

ALPHEBET 


ALLEGIENCE 

ALRIGHT 


ALLOTEO 

AMOUNG 


ALLWAYS 

AMONG 


ALOT 

APARATUS 


alotTed 1 

APINION 1 


APOLIGIZE 

AQUIRE 


APOLOJIZE 

AQUIRING 


APPARANT 

ARGUEING 


APPARANT 

ARGUEMENT 


APPEARENCE 

ARGUEMENT 


AQUAINT 1 

ARRANGMENT 1 


ARTICAL 

ATTATCH 


ARTIC 

ATTENDENCE 


ASSASiN 

ATTENDENT 


ATHELETE 

AUDIANCE 


ATHELETIC 

AUGEST 


atiTude 1 

AUTHOR AT AT IV J 


AUTHORATY 

BALOON 


AUTOES 

BARBEROUS 


AUTOES' 

BARGIN 


BACKRQUND 

BASICLY 


BA'LENCE 

BASICLY 


BALENCING I 

BEATIFUL 1 


BECOMMING 


BEFOR 


BEGGER 


BEGINING 


BELEIF 


BELEIVE 1 

i 
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8ELEIVING BENIFICIAL BENIFITED BENIFITTED BEUTIFUL 



3IGER 

BRETH 

BURYED 

CAPTIN 

CELLER 

CHANGA8LE 



BISCIT 
3RITIN 
BUSNESS 
_CARFUL 
CEMET ARIES 
CHARACTOR 



81 SCUT 
BULETTIN 
CAFATERIA 
CATAGQRIES 
CEMETARY 
CHARICTER 



300KEEPER 

3ULLETTIN 

CALENDER 

CATAGQRIES 

CENTERY 

CHILDERN 



BOUNDRY 
8URGLER 
CAMOFLAGE 
CATAGORY 
CERTIN 
CIELING 



BRECKFAST 

BURIEL 

CAMPAIN 

CATAGORY 

CHALLANGE 

CIGARETE 



; 



CIGERETTE 

COLLOS^AL 

COMPAR'ifiVE 

CONC lEVABLE 

CbNSERT 

CONSTENT 



CIGGARETE 
CQLOSAL 
CCMPAT ABLE 
CONCIEVE 



CIGGARETTE 
CCMERCIAL 
COMPETANT 
CONCIOUS 
CONSlbUS^ 



COFEE 
COMITTEE 
COMPLEAT 
CGNSEAL 
CONSISTANCY 



COLLEDGE 
COMHING 
COMPLETLY 
CONSENTRATE 
CONSiSTANt ■ 



CONVINIENCE CORELATE 
CORRESPOJ^pANCOUNCEL_ _ 
CRUELY'' "'CORrCULAR 
CUSTOM _ decent 
OPSlbE' ■■ ■ desTreable 

DICIPLINE OICIPLINING 



CONSIENCE - „ 

r fiNTPMPFP arycontemporerycontrave rsy control ing 

Cb ROBORAf E ' 

CRITICICM 



COLLOSAL 

COMMITEE 

COMUNI ST 

CONSERN 

CONSISTANT 

CONVIENCE 



CORIDOR 
COURAGOUS 
CUR I COLLAR 
DEFERENCE 
OESPARF 
OIFERENT 



coroboratingcorperation 
CRITICISE CRITISISM 



OILEMA 

o_i_saster_ous 

biscRTptib'N 

DISSAPOINT 

oonkies” ‘ 

ELABERATE 



DILIGANT 
qi SATISFIED 

DirGISE 

DISSAPPOINT 

DROPED 

ELECTRICTY 



DILLEMA 

qiSCRIBE 

brsiPlE 

DISTRUCTION 

ECHOS 

ELECTRISITY 



CO'R ICULUM 
DEMJCRASY 
blCEASE 
DIFFRENT 
DINNING' 



CURRING 

DEPENDANT 

DICIPLE 

DIFICULT 



"CURrCU'LLOM 
DEFINATE 
D'EV ELOPE 

01 FFERANT 

DILLIGENT DINNING OISAPOINT 

DISCRIMANATEOISCRIMANATIDISCRIMANATI 
^blSTP'LTN'E DISIPLINiNG OISPAIR 

OOCTER OOMINENT OOMINENT 

EFFICIENtCY EIGTH 

ELIGABLE EMBARASS 



ECSTACY 

ELIGABLE 



g 

& 

I 



E 



i 



EMBARRAS 

ENTRENCE 



EQUIPPMENT 
EXCEDING _ 

exercizT 

EXITE 



EMINANT 

ENVIREMENT 

'ERAtiC 

JXCEIAANCE 

exersise' 
EXITING 



EMPERER 
ENVOLVE 
EXAGERATE 
EXCELLANT 
EXERSIZE 
EXPENCE 



ENTERPRIZE 

EPEDEMIC 



ENDEAVER 

EPADEMIC 

EXAGERATING EX A US T 
EXELLENCE EXELLENT 
■'EX i BIT 



EXISTANCE 

EXPERAME NT EXPEREMENT 

EXPLAINATI 'ONEXPLINATION EXTRACURICULEXTRACURR ICUEXTREAM 



PAGINATE _ 
FA MIL I ER 
FEBUARY_ _ 
FOURTY 
FUNPEMENTAL 



FAC INATING 
FANTACY 
FEILD 



FACINATION 
FASINATE 
FICTICIOUS 



FALLICIES FALLICY 



FREIND 

GARENTEE 



F AS i'NAt I NG FAS"i'NATi'ON 

FJCTIOUS FINALY 

FR i E N DL Y N ES S F R I GHT N l' NG FUE 6 AL 
GAYETY GENERALY GILTY 



GON 

GRUSOME 

HANKERCHIEF 

HEROS 

HUNDERD 

IGNORENCE 



GOVENOR 

GUAGE 



HAPPENNED 
HINDRENCE 
HUNGER Y 
IGNORENT 



'GOVERMENT 

GUARENTEE 

happy¥ess 

HORRABLE 

HORRIDLY 

IMAGINERY 



GRAMMER 

GUIOENCE 

HARRAS ' 

HORRABLY 

HYGEINE 

IMEDIATE 



GRANDURE 
GYMNAZIUM 
HARR ASS 
HUMEROUS 
■^HYPOCRACY 
IMENSE 



ENTIRLY 

EQUIPED 

EXCEOE 

EX^T 

EXITABLE" 

EXPERIANCE 

EXTREMLY 

FAMILAR 

favor IT* 

FORIEGN 

FULLFIL 

GODESS 

GRANDUR 

HANOKERCHEIF 

HEREOITERY 

HUMER 

lOEALY 

INFORMATION 



IMMAGRANT IMMEOIATLY IMMENCE IMMIGRENT 

IMPORTENCE X^OB^ENT JMPORTENT X^CONVIENCE 
rNOEPENDANCEINOEPENDANT INOREOIANT INEVITIBLE 

INGENOUS INITATIVE INOCENT INTELECT 

iWRESf ’ IRR EL EVENT IRRESISTABLEJEALUS 

JEWLERY JOURNIES JUVENIL JUVIMILE 



IMPERTINANT IMPORTENCE 

inconviniencincreoable 

INEV'ITIBLY INFLUENCIAL 
interferanceinterpertati 
Jelous jewelery 

laberor labrat ories 



LAYEO 

LIVLIEST 



LABRATORY 

LITRATURE 

'LUXERIE"S LUXERY 

HA INJXXl^NCEMAXICqjU^^ 
MARRIOGE " MATERIEL 
MELENCHOLY MENT 
MISPELL MONKIES 



LEIZURE 
LIVLIHOOD 
MABE 
MANER 
MATHAMATIC 
METEPHOR 
MONOTONUS 




LICENCE 
LONLEYNESS 
MAGIZINE 
MANOUVER 
ME DECAL 
MINAMUM 
MORELLY 
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liesure liklihood 

LONLINESS LOOSES 

M'AGN IF IC ENSEMAGNI F I SENCE 
MARI AGE MARRAGE 

MEDECiNE * MEOOW 

HINITURE MISCHEIVOUS 
MORGAGE MORILLY 
















.MOSQUITOS MUSLE 



APPENDIX A (Continued) 

MUSSLE NARATIVE NATURALY NECCESARILY 



NECCESITY 

NEGROS 



NESESSARY NESESS ITY 

NUMQRO US NUSA NCE 

OCASSION “ OCCASSION 
OCCURING OCCURRANCE 



NECCESSARILYNECCESSARY 
HEICE NEIGBOR 

NICKLE 
08ED1ANCE 
bCCURANCE 
OCCURRANCE 



NECCESSITY 

NESECITY_ 

NINETH 

OBEDIAI^ 

OCCURANC E 

OMITED 



NECESARY 
NES_ESARILY 
NOT I CABLE ' 

obstjcle 

■'dCCURENCE 

OMMIT 



NEGHBOR 
NESESITY 
NUMERUS 
OBSTICLE 
OCCURENCE 
OMMIT 






OMMITTEO 

OPTOMIST 



OPION 

ORGINIZE 



OPORTUNITY 

ORIGINEL 



OPPERATE 

ORIGINIL 



OPPONANT 

PAMFLET 



OPTOMISM 

PARALIZE 



PARIDISE 

PARTICULER 



PARLAMENT 

PAYED 



PARLEMENT 

PEASENT 



PARRALEL 

PECULIER 



PARRALLEL 

PERCIEVE 



PERMANANT 

PERSUING 



PERMENENT 

PERTINANT 



PERSISTANT 

PHAMFLET 



PERSONALY 

PHENOMINA 



PLAGERIZE 

POLITITIAN 



PLAJER ISM 
POLITITION 



PLAJERIZE 

PORTRATE 



PLAUSABLE 

PORTRAT 



PERSONEL 

PHENOMINON 

PLAYWRITE 

PORTRIT 



PARTICULER 

PERFORMENCE 

PERSUE 

PLAGERISM 



PLEASENT 

POSESSIVE 






POSSESIVE POTATOS PRACTICLY PRARIE PRECEED PRECICELY 

PREEMIUM PREFERED PREFERING PREFORMANCE PREPERATION PRESENSE 



'PRESTEGE 

PRIVILEOGE 



PREVELANT 

PROBIBLY 






PROMINANT 
I RADIOES 



PROOVE 

RAIDO 



P'REVELENT 

PROCEEDURE 

PURSUADE 

RECCOMEND 



PRIMATIVE 

PROFFESION 

QUANOITY 

RECEVE 



PRIMATIVE PRISINER 
PRQFFESSION PROFFESSOR 
^UESTIONAIREQUIZES 



■recieving 

REHERSAL 



RECOMEND 

RFLEGION 



RECONIZE 

RECEIVE 



REFERANCE 

RECEIVING 



RECEVING 

REFERED 

RELETIVE 



RECIEVE 

REFERING 

RELEVENT 



RELIGEON' RELIT! VE REMBER 
REPITITION REPRES ENT IT ! RESPONCE 



REMEMBERANCEREPITION 
RESTARANT RESTARONT 



REPITITION 

RESTURANT 



RESTURONT 

ROOMATE 



REVEEL 

RYTHM 



RIDACULE 

SAFTY 



RIDECULE 

SATERDAY 



RIDECULING 

SAYEO 



SECRETERY SEIGE SENCE 

SEPERATING SEPERATION SERVENT 



SENE 

SEVEREL 



SENTANCE 

SHEPERO 



RIGHTOUS 

SECRETERIES 

SEPERATE 

SHERRIFF 



SHERRIF 

SIGNITURE 



SOUVINIR 

STORED 



SHINNING 

SIMILER 

SPAGETTI 

STORING 



SHINNY 

SIMPEL 



SIEZE 

SINCERLY 



SIEZING 

SOCIATY 



SPEACH 

STORYS 



SPONSER 

STRECH 



STEPED 

STRENTH 



SIGNIFICENCE 

SOPHMORE 

STOMACK 

STUBORN 



STUDING SUBTILE SUBUB SUCEED SUCESS SUGER 

SUGEST SUMARIES S UMARY SUPERI NTEND ASUPORT SUPOSE 



SUPRESS 

SURJERY 



SUPRISE 

SURJON 



SUPRISING 

SUROUNO 



SUR8URB 

SURPRIZE 



SURGURY 

SUSPENCE 



SUTLE 

SYMBOL E 

TEMPORERILY 

THANKYOU 



SUVENIR 

JAj UNT 

TEMPORERY 

THERFORE 



SWIMING 
TECNIQUE 
TENDANCY 
THIER 



SURGON 

SURPRIZING 

SWIMING SYMBEL SYMBLE 

TEMPERERY TEMPERM ENT TEMPE RTURE 

TERRABLY 



TENDANCY 

THOUSEND 



TERRABLE 
THROUGHLY TOLE RENC E 



TOLERENT TOMATOS 



TOMMOROW TOMMORROW TRAGAOIES 



‘TRAGADY 
TRUELY 



1 


TURKIES 


TYRANY 


UNATURAL 


UNECESSARY 


UNTILL 


USEING 


1 


USFUL 


USLESS 


VACCUM 


VACCUUM 


VALUBLE 


VAR I OS 


1 


VARIUS 


VEGATABLE 


VEGTABLE 


VEIW 


vengence' 


V ILL IN 


1 


WENOOCY 


WENEDS DAY 


WETHER 


WICH 


WIEGHT 


WI ERD 


i 


WRITEN 


WRITTI NG 


YEILO 


HOW 


WHAT 


WHERE 


1 


WHICH 


WHOM 


WHO 


WHOSE 


ZZZZZZZZZZZZZZZZZZZZZZZZ 


s 


ZZZZZZZZZZZZAFRICAN 


AFRICA 


ALL 


ANOTHER 


AN 


1 


ANYBODY 


ANYONE 


ANY 


A 


BLOOD 


BOTH_ 


1 


CAPTAIN 


CHINA 


CHINESE 


CIVILIZATIONCOLLEGE 


COLONEL 


1 


CORPORAL 


COURAGE 


CROME 


DAVID 


OR 


EACH 1 




EIGHT 


EITHER 


ENGLAND 


ENGLISH 


EVERYBODY 


Tveryone I 


1 


EVERY 


FEW 


FIVE 


FOOD 


FOUR 


FRANCE 1 


1 - 

1 


FRENCH 


GENERAL 


GEORGE 


GERMAN 


GERMANY 


GOVERNOR 
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APPENDIX A (Continued) 
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'STX'^ 


' socTetY 


SOMEBODY 


SOMEONE 


r SOME 


SPAIN 


SPANISH 


SUCH 


_SUE 


SURVIVAL 


1 TELEVISION 


TEN 


■ ■ THAT ■ 


THEIR 


THE 


THESE ■ 
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WATER 


WE 


WHOLE 


mLIAM 


WOMEN 


1 YOUR 


YOU 


zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz 


1 'ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ 


i ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ 


i "ZZZZZZZZZZZZZZZZZZ2ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ2ZZZ2ZZZZ 1 


1 llllllllllll 








i 


1 ACROSS 










i 


1 AS 










J 


r HAS 










I 


i IS 












1 LESS 












1 THUS 












f UNLESS 












1 WAS 












1 YES 













I 



t 



llllllllllll __ __ _ 

99 1913123311101034131 7182021161519149999999999999999999999999999999 

99999 

SIBSYS 














APPENDIX B 



TABLE IV-n (A) 

Predictor n == 25 

SHRUNKEN MULTIPLE-REGRESSION COEFP'ICIINTS 



COMPUTID PRCM WHERRY FOPJ4UU 
(See Chapter IV) 



Discovered 








Sample Size 




275 


300 


MULTR 


100 


125 


150 


175 


200 


225 


250 




00 


25 


31 


35 


38 


39 


41 


42 


43 


.51 


10 


27 


33 


37 


39 


41 


42 


43 


44 


.52 


15 


29 


35 


38 


41 


42 


43 


44 


45 


.53 


19 


32 


37 


40 


42 


44 


45 


46 


Zf6 


.54 


23 


34 


39 


42 


44 


45 


46 


47 


48 


.55 


26 


36 


40 


43 




46 


-Jt7_ 


48 


49 


— 35 


29 


37 


42 


45 


46 


4^ 


49 


49 


50 


.57 


31 


39 


43 


46 


48 


49 


50 


51 


51 


.53 


34 


41 


45 


47 


49 


50 


51 


52 


53 


.59 


36 


43 


47 


49 


50 


52 


52 


53 


54 


.60 


?8 


45 


48 


50 


52 






— 




.61 


40 


46 


50 


52 


53 


54 


55 


56 


50 


.62 


42 


48 


51 


53 


54 


55 


56 


57 


57 


.63 


44 


49 


52 


54 


56 


57 


37 


58 


58 


.64 


46 


51 


54 


56 


57 


58 


59 


59 


60 


.65 


48 


5? 


?? 


57 




-1^ 


60 


60 


61 


— 35 


49 


54 


57 


58 


60 


60 


61 


62 


62 


.67 


51 


56 


58 


60 


61 


62 


62 


63 


63 


.68 


53 


57 


60 


61 


62 


63 


63 


64 


64 


.69 


55 


59 


61 


62 


63 


64 


65 


65 


65 


.70 


56 


60 


62 


64^ 


65 




66 


66 


-4^ 


.71 


58 




64 


65 


66 


66 


67 


67 


68 


.72 


60 


63 


65 


66 


67 


68 


68 


69 


69 


.73 


61 


64 


66 


67 


68 


69 


69 


70 


70 


.74 

.75 


63 


66 


68 


69 


69 


70 


71 


71 


71 


6/| 


6? 


69 


70 


71 


71 


72 


72 


72 


.70 


— fr 


“f9 


70 


71 


72 


72 


73 


73 


73 


.77 


67 


70 


71 


72 


73 


74 


74 


74 


75 


.78 


69 


71 


73 


74 


74 


75 


75 


75 


76 


.7 / 


71 


73 


74 


75 


76 


76 


76 


77 


77 

AM aA 


.80 


72 


74 


75 


76 


77 


77 


77 


78 


78 
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APPENDIX B 



TABLE IV-U (B) 

Predictor n = 30 

SHRUNKEN MULTIPLE-REGRESSION COEFFICIENTS 
COMPUTED FRai IVHESRY FORMULA 
(See Chapter IV) 



Discovered Sample Size 



MULTR 


100 


125 


150 


175 


200 


225 


250 


275 


300 


.50 


00 


10 


25 


31 


34 


37 


38 


40 


41 


.51 


oc 


15 


27 


33 


36 


38 


40 


41 


42 


.52 


00 


19 


29 


34 


37 


40 


41 


43 


43 


.53 


00 


23 


32 


36 


39 


41 


43 


44 


45 


.54 


00 


26 


34 


38 


41 


43 


44 


45 


46 


.55 


00 


28 


36 


40 


42 


44 


45 


47 


47 


.56 


12 


31 


37 


41 


44 


46 


47 


48 


49 


.57 


18 


33 


39 


43 


45 


47 


48 


49 


50 


.5S 


22 


35 


41 


45 


47 


48 


50 


50 


51 


.59 


25 


37 


43 


46 


48 


50 


51 


52 


52 


.60 


29 


?9 


45 


48 


50 


51 


52 


53 


54 


.61 


3r 


41 


46 


49 


5l 


52 


33 


34 


33 


.62 


34 


43 


48 


51 


52 


54 


55 


56 


56 


.63 


37 


45 


49 


52 


54 


55 


56 


57 


57 


.64 


39 


47 


51 


54 


55 


56 


57 


58 


59 


.65 


41 


49 


53 


55 


57 


58 


59 


59 


60 


.66 


44 




54 




5^ 


59 


6b 


6r” 


61 


.67 


46 


52 


56 


58 


59 


60 


61 


62 


62 


.68 


48 


54 


57 


59 


61 


62 


62 


63 


63 


.69 


50 


56 


59 


61 


62 


63 


64 


64 


65 


.70 


52 


57 


60 


62 


63 


64 


65 


65 


66 


.71 


54 


59 


62 


63 


6$ 


63 


7)T" 


67 


6?^ 


.72 


56 


60 


63 


65 


66 


67 


67 


68 


68 


.73 


57 


62 


64 


66 


67 


68 


68 


69 


69 


.74 


59 


64 


66 


67 


68 


69 


70 


70 


71 


.75 


61 


65 


67 


69 


70 


70 


71 


71 


72 




63 


W 


69 


9o 


71 


72 


72 


73 


?J 


.77 


64 


68 


70 


71 


72 


73 


73 


74 


74 


.78 


66 


70 


71 


73 


73 


74 


74 


75 


75 


.79 


68 


71 


73 


74 


75 


75 


76 


76 


76 


.80 


69 


72 


74 


75 


76 


76 


77 


77 


77 
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APPENDIX B 



TABLE IV-11 (C) 

Predictor n = 35 

SHRUNKEN MULTIPLE-REGRESSION COLFFICIENTS 
CCl'iPUTED ERCM WHERRY FORMULA 
(See Chapter IV) 

Discovered Sampl e Size 



MULTR 


100 


125 


150 


175 


200 


225 


250 


275 


300 


.50 


00 


00 


14 


25 


29 


33 


36 


37 


39 


.51 


00 


00 


18 


27 


32 


35 


37 


39 


40 


.52 


00 


00 


22 


29 


34 


37 


39 


40 


42 


.53 


00 


00 


24 


32 


36 


38 


40 


42 


43 


.54 


00 


11 


27 


34 


37 


40 


42 


43 


44 


.55 


00 


17 


JO 




?? 


42 


43 


45 


46 


"^5T 


00 


21 


32 


3^ 


41 


43 


45 


46 


47 


.57 


00 


24 


34 


39 


43 


45 


46 


47 


48 


.58 


00 


27 


36 


41 


44 


46 


48 


49 


50 


.59 


00 


30 


38 


43 


46 


48 


49 


50 


51 


.60 


10 


33 


40 


45 


47 


49 


51 


52 


52 


— rsi — 


— rr 


35 


4i 


46 


49 


51 


52 


53 


54 


.62 


22 


38 


44 


48 


50 


52 


53 


54 


55 


.63 


26 


40 


46 


50 


52 


53 


55 


56 


56 


.64 


29 


42 


48 


51 


53 


55 


56 


57 


58 


.65 


33 


44 


50 


?3 


55 


56 


57 


58 


59 


' .d' ■ 




46 


51 


54 


56 


5^ 


59 


59 


6(r 


.67 


38 


48 


53 


56 


58 


59 


60 


61 


61 


.68 


a 


50 


55 


57 


59 


60 


61 


62 


63 


.69 


44 


52 


56 


59 


60 


62 


62 


63 


64 


.70 


46 


54 


58 


60 


62 


63 


64 


64 


65 





4T^ 


“ 5 S“ 


59 


62 


63 


64 


6l^ 


66 


66 


.72 


51 


57 


61 


63 


64 


66 


66 


67 


67 


.73 


53 


59 


62 


64 


66 


67 


68 


63 


69 


.74 


55 


61 


64 


66 


67 


68 


69 


69 


70 


.75 


57 


62 


65 




68 


69 


70- 


7I- 




.76 


59 


64 


67 


69 


70 


71 


71 


72 


72 


.77 


61 


66 


68 


70 


71 


72 


73 


73 


73 


.78 


63 


67 


70 


71 


72 


73 


74 


74 


75 


.79 


65 


69 


71 


73 


74 


74 


75 


75 


76 


.80 


67 


71 


73 


74 


75 


76 


76 


77 


77 
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APPENDIX B 



TABLE IV-11 (D) 

Predictor n * 40 

SHRUNKEN MULTIPLE-REGRESSION COEFFICIENTS 
COmTED FROM WHERRY FORMULA 
(See Chapter IV) 



Discovered Sample Size 



mTR 


100 


125 


150 


175 


200 


225 


250 


275 


300 


.50 


00 


00 


00 


16 


25 


29 


33 


35 


37 


.51 


00 


00 


00 


20 


27 


32 


34 


37 


38 


.52 


00 


00 


05 


23 


29 


33 


36 


38 


40 


.53 


00 


00 


13 


26 


32 


35 


38 


40 


41 


.54 


00 


00 


18 


28 


34 


37 


40 


41 


43 


.55 


00 


00 


22 




36 


3? 


a 




-4- 


.56 


00 


00 


25 


33 


38 


4l 


43 


im 

46 


46 


.57 


00 


06 


28 


35 • 


39 


42 


44 


47 


.58 


00 


14 


30 


37 


a 


44 


46 


47 


48 


.59 


00 


19 


33 


39 


43 


45 


47 


49 


50 


.60 


00 


24 


35 


41 


45 


47 


49 


50 


51 


.61 


00“ 


27 


38 


43 


46 


49 


50 


51 


52 


.62 


00 


30 


40 


45 


48 


50 


52 


53 


54 


.63 


00 


33 


42 


47 


50 


52 


53 


54 


55 


.64 


10 


36 


44 


48 


51 


53 


54 


56 


56 


.65 


18 


38 


46 




» 




36 


37 


38 




23 


a 


48 


52 


54 


56 


57 


58 


59 


.67 


27 


43 


50 


53 


56 


57 


59 


60 


60 


.68 


31 


45 


51 


55 


57 


59 


60 


61 


62 


.69 


35 


48 


53 


57 


59 


60 


61 


62 


63 


.70 


38 


50 


55 


58 


60 


62 




63 


-1^ 


.71 


41“ 


52 


57 


68 


62 


63 


64 


65 




.72 


44 


54 


58 


61 


63 


64 


65 


66 


67 


.73 


47 


56 


60 


63 


64 


66 


67 


67 


68 


.74 


49 


58 


62 


64 


66 


67 


68 


69 


69 


.75 


52 


60 


63 


66 


67 


68 


69 


70 


70 


.76 


54 


“51“ 


"T5“ 


6? 


6$ 


78 


io 


71 




.77 


56 


63 


67 


69 


70 


71 


72 


72 


73 


.78 


59 


65 


68 


70 


71 


72 


73 


74 


74 


.79 


61 


67 


70 


72 


73 


74 


74 


75 


75 


.80 


63 


68 


71 


73 


74 


75 


76 


76 


76 
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% 






% 



I 



I er|c 



AFPOIDIX B 
TABLE IV-U (E) 



Predictor n * 45 



Discovered 

KULTR 



.50 

.51 

.52 

.53 

.54 



SHRUNKEN MULTIPLE^RBGRESSION COEFFICIQiTS 
COMPUTiD FROM WHERRY FORMULA 
(See Chapter IV) 



100 125 150 



Semple Size 
175 200 225 



250 275 300 



00 

00 

00 

00 

00 



00 

00 

00 

00 

00 



00 

00 

00 

00 

00 



00 

04 

13 

17 

21 



18 

21 

24 

27 

29 



25 

27 

29 

32 

34 



29 

31 

33 

35 

37 
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32 

34 

36 

37 
39 



67 


“T9 


70 


70 


69 


70 


71 


72 


70 


71 


72 


73 


71 


73 


74 


74 


73 


74 


75 


75 



34 

36 

3B 

39 

41 



• 

Tso 


uu 

00 


uu 

00 


13 




34^ 


38 


40 


42 


44 


.57 


00 


00 


18 


30 


36 


39 


42 


44 


45 


• 58 


00 


00 


22 


32 


38 


41 


44 


45 


47 


.59 


00 


00 


26 


35 


40 


43 


45 


47 


48 


•60 


00 


00 


29 








LA 


<50 


yU 

51 




APPBIDIX B 



TABLE IT-U (P) 



Discovered 

MULTR 



.50 

.51 

.52 

.53 

.54 



Predictor n * 50 

SHROIKEN MULTIPLB-RBGrRESSIOK GOEFFICIPKTS 
COmjTH) fllOM WHEHRY PORl-iULA 
(See Chapter IV) 

Sampile Size 

100 125 150 175 200 225 250 275 300 



00 

00 

00 

00 

00 



00 

00 

00 

00 

00 



00 

00 

00 

00 

00 



00 

00 

00 

00 

08 



00 

u 

16 

20 

23 



19 

22 

25 

27 

30 



25 

27 

30 

32 

34 



29 

31 

33 

35 

37 



32 

35 

35 

37 

39 





.56 

.57 

.58 

.59 

.60 


00 

00 

00 

00 

00 


00 

00 

00 

00 

00 


00 

00 

04 

14 

1? 


19 

23 

26 

29 

32 


29 

31 

34 

36 

38 _ 


34 

36 

38 

40 

42 


38 

39 

41 

43 


40 

42 

43 
45 

-JtL- 


42 

44 

45 
47 




.62 

.63 

.64 

.65 


00 

00 

00 

00 

00 


00 

00 

00 

10 

18 


23 

27 

30 

33 


34 

37 

40 

41 

44 


40 

42 

44 

46 

48 


44 

46 

47 
49 


46 

48 

50 

51 

32- 


48 

50 

51 

53 


50 

51 

53 

54 


.66 

.67 

.68 

o 69 

.70 


00 

00 

00 

00 

00 


23 

28 

31 

35 

38 


3$ 

41 

44 

46 

48 


46 

48 

50 

51 

?? 


50 

51 

53 

55 

-Ji- 


52 

54 

55 
57 

-42- 


54 

56 

57 

59 

60 


56 

57 

59 

60 
61 


57 

58 
60 
61 
62 


.71 

.72 

.73 

.74 

.75 


00 

16 

24 

29 

34 


a 

44 

47 

49 

52 


50 

52 

55 

56 
58 


55 

57 

59 

60 
62 


58 

60 

61 

63 

64 


60 

61 

63 

65 

66 


62 

63 

64 
66 
67 


63 

64 

65 

67 

68 


64 

65 

66 
68 

-62. 


.76 

.77 

.78 

.79 

.80 


38 

42 

46 

49 

52 


54 

56 

59 

61 

63 


60 

62 

64 

66 

68 


64 

65 
67 

69 

70 


66 

68 

69 

71 

72 


68 

69 

70 

72 

73 


69 

70 

71 

73 

74 


70 

71 

72 

74 

75 


70 

71 

73 

74 

75 
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APPBOIX B 



TABLE IV-n (G) 

Predictor n * 55 

SHRUNKM MULTIPLE-RBSRESSICai COEFFICIiNTS 
COMPUTED PROM WHERRY FORMUU 
(See Chapter IV) 



Discovered Sample Size 



MULTR 


100 


125 


150 


175 


200 


225 


250 


275 


300 


.50 


00 


00 


00 


00 


00 


08 


19 


25 


28 


.51 


00 


00 


00 


00 


00 


14 


22 


27 


31 


.52 


00 


00 


00 


OQ 


00 


18 


25 


30 


33 


.53 


00 


00 


00 


00 


08 


22 


28 


32 


34 


.54 


00 


00 


00 


00 


15 


25 


30 


34 


36 


.55 


00 


00 


00 


00 




27 


32 


36 


38 


.56 


00 


00 


00 


00 


23 


30 


35 


38 


40 


.57 


00 


00 


00 


U 


26 


32 


37 


39 


42 


.58 


00 


00 


00 


17 


29 


35 


39 


41 


43 


.59 


00 


00 


00 


22 


31 


37 


40 


43 


45 


•60 


00 


00 


00 


2? 


24 


3? 


42 


4? 


46 


.61 


00 


00 


07 


29 


36 


a 


44 


46 


48 


.62 


00 


00 


16 


32 


39 


43 


46 


48 


50 


.63 


00 


00 


a 


34 


a 


45 


48 


50 


51 


.64 


00 


00 


25 


37 


43 


47 


49 


51 


53 


.65 


00 


00 


2? 


?? 


4? 


48 


?1 


?3 




.66 


00 


00 


32 


42 


47 


50 


53 


54 


56 


.67 


00 


10 


36 


44 


49 


52 


54 


56 


a7 


.68 


00 


18 


38 


46 


51 


54 


56 


57 


58 


.69 


00 


24 


41 


48 


53 


55 


57 


59 


60 


•I? 


00 


29 


44 










60 


61 


.W 


<x> 


33 


46 


52 


56 


54 


60 


62 


63 


.72 


00 


37 


49 


54 


58 


60 


62 


63 


64 


.73 


00 


40 


51 


56 


60 


62 


63 


64 


65 


.74 


00 


43 


53 


58 


61 


63 


65 


66 


67 




\l 


46 


11 


60 


63 




66 




68 




22 


W 




62 


65 


d 


68 


69 


69 


.77 


29 


52 


60 


64 


66 


68 


70 


70 


71 


.78 


34 


54 


62 


65 


68 


6f 


71 


71 


72 


.79 


39 


57 


64 


67 


69 


71 


72 


73 


73 


.80 


44 


59 


66 


69 


71 


72 


73 


74 


75 
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APPaDU c 



COKFUTiR mOCSUM 
Writtwi by Donald Harcotta 



lIBFTC FHRA3 

SUiROUTIHE PHRASE 
COMMC^/lNUD/lKDniD 
CC3M«»/lM/BROIUP,TECr,LEICTH 
CC)MMCa/PSUM/R£liIiN£rr 
CCIM0H>1«APH/PHRMAT , _ 

INTBBHl RC (12),ITEXr(200),PHHMAT(300,8) 

COMMOH/ACC/TOTAL,ID 
mTB&£R THRZ,ASTRK»QUOT£ 

COHMON/QTE/N^Or£ . 

DATA PHaOA,EICLAA,QUESTA/2H.*,3H.X*,3H.W 
DATA COMMAA/1H,/ 

INTBGrER CCMMAA,PffiIOA,IIDIAA,QUESTA 

data ASTRK/IH*/ 

DATA THRZS/3HZZZ/ 

REAL HLFT2T(200) ^ ^ . 

DOUBLE PRBDISIOII TEIT(100),IMDIW)(254) 

BiUIVALEMC£(HLPTXr »T£Xr» ITEXT ) 

LOGICAL IBTABL 

REAL EROKUP(aO) ■ 

IBTBGIR LENGTH (1 00) ,RELH(30),lliXr,TOTAL 

NQUOTE * 0 

LCSEMT « 0 

QUOTE « 0 

IPE - 0 

INEir « 2*NEXr • 4 
DO 5 ISENT • l,lMEa 
IP(ISBIT,LT.IPE) 00 TO 5 
IC8ENT » ISENT r ISBfT/2 
IP(TGSEIIT.«*LCSElir) 00 TO 5; 

ffoSn(^53i.cowu.aR.ran(iam).«.mio4.«.iiai(isBB 

.aPHro.508) ) oo to ii8 

00 TO 5 

IF{lto(I3ffl*)?Bl.IWIMtT(ira,1)) 00 TO *4 
00 TO 4 

44 ISBW'O isan.-i-ii 

H<ITlir(l8P0W).BJ.FBIBM(lFB»2)) 00 TO 7 
4 CCRTiaDB 
GOTO 5 

7 IFE ■ I3P0W + 1 
S4CC * 2 

BC(1} • m)UT(IFB,1) 

RC(2) - FHBIttI(IFB.2) 

IPO - IPB 
DO 13 lire - 3f0 



APF1S21DIX C (Conttnu«d) 

IF(ITEXr(IPE).iQ.PHRKAT(lPC,IIPC)) GO TO 21 
IP(PHRMAT(IPC,IIPC).i«,THRZS) GO TO 23 
IP(KACC.£Q.2«C».KACC.BQ.4.0R.KACC.IQ.6) GO TO 84 
GO TO 23 

21 KACC » XACC + 1 

RC(KACC) * PHRMAT(IPC,IIPC) 

IPE « IPE + 1 
13 CONTINUE 
GO TO 23 

84 IPB « IPB + 1 

IF(ITEXT(lSENT).aj.PHRMAT(IPB,l)) GO TO 49 
IPE « ISEMT + 2 
GO TO 5 

49 IF(ITEXT(ISP0W).BQ.PHRMAT(IPB,2)) GO TO 7 
IPE - ISENT + 2 
GO TO 5 

23 ICFAST « ISENT - 2 
LCFAST * IPE 
LWOP « KACC 

IF(ITEXT(ICFAST).EQ.ASTRK) GO TO 113 
GO TO 221 

113 IF(lTEXT(LCFAST.Ea.ASTRK) GO TO 114 

IF(ITEXT(LCFAST) .EQ.COMHAA.OR.ITEXrd/JFAS^r) .HJ.PERIOA.OR.ITErr(LCF 
lAST).Eft.E3CCLAA.OR.ITErr(LCFAST.BQ.QUESTA) GO TO 114 
ICAC - LCFAST + 2 

IF(ITEXr(LCFAST).EQ.CONMAA.AND.ITEXr(ICAC).EQ.ASTRK) GO TO 114 
GO TO 221 

1 14 QUOTE * 1 

NQUOTE = NQUOTE + 1 

221 WRITE(7,223) ID,IPC,QUOTE,(RC(IRC),IRC-l ,INOP) 

223 FCRMAT(5X,Z5,5X,15,5I,I5,5X,aA6) 

TOTAL » TOTAL + 1 

WRITE(6,923) (RC(IRC),IRC - l,IWOP) 

923 FORMAT (12H0FHRASE IS ^12A6) 

DO 62 IRCA « 1,8 
62 RC(IRCA) * 0 
QUOTE » 0 
5 CONTINUE 
RETURN 
END 
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APPEXDIXl) 



Pl/I PROGRAM PARSE 
Written bgr Gerald Fiiher 




U k-o 

LEM<-0 






Z 4** ^ 
Z 4- S 






i 4- 


1 










B# ir an array* R#(l) ia 
the rule manber applied 
when parsing the i th 
letter * 

LEX is an array* LEX(i) is 
the length of the rule 
applied to word i* 

Z ia the paslwdoim s t ore— 
rather Z gives the top* 
Boat ayabol on the FDS* 





Have we checked to 
see if any rule ia 
applicable* — i*e* 
ia there a rule 
whose left hand aide 
is Z and handle ia 
the i th word of 
the sentence? 








APPQIDIX D (Coniionid) 



RULE MATCH 




NO MATCH 



0 



LBi(i) f- 
lengih ofrula 
R#(i) ^ j 
(Jth rule appl ies) 




Romove top of 
PDS*^i« o« f 

FREE Z 



Add moro 
teradnals in 
role following 
faaadi# to FDS 



R#(i) f-0 
LEN(i) f-0 
1 e i-1 




Remove 
predictions 
made bj R#(i) 



hit tne left 
hand side of 
R#(i) back & 
TRY AGAIN 



i 4-i-l 




0 



(histtocttsl- 
ful Pars# 
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